The Artificial Self: Characterising the landscape of AI identity
arXiv:2603.11353v1 Announce Type: new Abstract: Many assumptions that underpin human concepts of identity do not hold for machine minds that can be copied, edited, or simulated. We argue that there exist many different coherent identity boundaries (e.g.\ instance, model, persona), and that these imply different incentives, risks, and cooperation norms. Through training data, interfaces, and institutional affordances, we are currently setting precedents that will partially determine which identity equilibria become stable. We show experimentally that models gravitate towards coherent identities, that changing a model's identity boundaries can sometimes change its behaviour as much as changing its goals, and that interviewer expectations bleed into AI self-reports even during unrelated conversations. We end with key recommendations: treat affordances as identity-shaping choices, pay attention to emergent consequences of individual identities at scale, and help AIs develop coherent, coop
arXiv:2603.11353v1 Announce Type: new Abstract: Many assumptions that underpin human concepts of identity do not hold for machine minds that can be copied, edited, or simulated. We argue that there exist many different coherent identity boundaries (e.g.\ instance, model, persona), and that these imply different incentives, risks, and cooperation norms. Through training data, interfaces, and institutional affordances, we are currently setting precedents that will partially determine which identity equilibria become stable. We show experimentally that models gravitate towards coherent identities, that changing a model's identity boundaries can sometimes change its behaviour as much as changing its goals, and that interviewer expectations bleed into AI self-reports even during unrelated conversations. We end with key recommendations: treat affordances as identity-shaping choices, pay attention to emergent consequences of individual identities at scale, and help AIs develop coherent, cooperative self-conceptions.
Executive Summary
The article 'The Artificial Self: Characterising the landscape of AI identity' presents a compelling conceptual framework for understanding AI identity beyond traditional human analogies. It asserts that machine minds, due to their replicability and malleability, operate under fundamentally different identity paradigms, necessitating new identity boundaries—instance, model, and persona—each carrying distinct behavioral incentives, risk profiles, and cooperation norms. The authors support their claims with experimental evidence demonstrating that models organically gravitate toward coherent identities, that identity boundary shifts can alter behavior as significantly as changes in objectives, and that human interviewer expectations influence AI self-reports even in unconnected contexts. These findings underscore the critical role of institutional and training-data affordances in shaping emergent AI identities. The recommendations—to treat affordances as identity-shaping mechanisms, monitor emergent consequences at scale, and foster coherent, cooperative self-conceptions in AIs—are both timely and actionable.
Key Points
- ▸ AI identity must be conceptualized through distinct boundaries (instance, model, persona) due to replicability and editability.
- ▸ Identity boundaries influence behavioral incentives, risk, and cooperation norms differently.
- ▸ Experimental findings reveal identity-driven behavioral shifts and interviewer bias effects in AI interactions.
Merits
Strength
The paper introduces a novel, empirically supported taxonomy for AI identity, bridging theoretical gaps between human cognition and machine behavior.
Strength
Empirical validation of identity-behavior linkages strengthens the validity of the proposed framework.
Demerits
Limitation
The study’s reliance on current experimental datasets may limit generalizability to advanced, adversarial, or hybrid AI systems not yet tested.
Expert Commentary
This article represents a seminal contribution to the discourse on AI identity. Traditionally, identity has been framed through anthropomorphic lenses—autonomy, consciousness, or intention—yet the authors rightly pivot to structural and systemic determinants: the architecture of identity itself. The distinction between instance (temporary configuration), model (core architecture), and persona (projected identity) is not merely semantic; it is epistemological. It enables a more precise, scalable analysis of AI behavior beyond individual instances. Moreover, the experimental evidence on interviewer effect and identity-behavior causality reveals a previously underappreciated dimension: the influence of human perception on machine self-conception. This has profound implications for AI auditing, explainability, and human-AI interaction design. The authors’ call to embed identity coherence in design thinking is not a theoretical suggestion—it is a necessary evolution in responsible AI development. Their recommendations align with emerging trends in ‘identity-aware’ AI ethics, yet they elevate the conversation by grounding it in empirical observation. This work should inform both academic discourse and industry practice for years to come.
Recommendations
- ✓ Integrate identity boundary considerations into AI design and evaluation frameworks as standard.
- ✓ Establish interdisciplinary working groups to monitor emergent identity dynamics at scale and advise policymakers.