Emulating Clinician Cognition via Self-Evolving Deep Clinical Research
arXiv:2603.10677v1 Announce Type: new Abstract: Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while lacking auditable mechanisms for governed improvement. We developed DxEvolve, a self-evolving diagnostic agent that bridges these gaps through an interactive deep clinical research workflow. The framework autonomously requisitions examinations and continually externalizes clinical experience from increasing encounter exposure as diagnostic cognition primitives. On the MIMIC-CDM benchmark, DxEvolve improved diagnostic accuracy by 11.2% on average over backbone models and reached 90.4% on a reader-study subset, comparable to the clinician reference (88.8%). DxEvolve improved accuracy on an independent external cohort by 10.2% (categories covered by the source cohort
arXiv:2603.10677v1 Announce Type: new Abstract: Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while lacking auditable mechanisms for governed improvement. We developed DxEvolve, a self-evolving diagnostic agent that bridges these gaps through an interactive deep clinical research workflow. The framework autonomously requisitions examinations and continually externalizes clinical experience from increasing encounter exposure as diagnostic cognition primitives. On the MIMIC-CDM benchmark, DxEvolve improved diagnostic accuracy by 11.2% on average over backbone models and reached 90.4% on a reader-study subset, comparable to the clinician reference (88.8%). DxEvolve improved accuracy on an independent external cohort by 10.2% (categories covered by the source cohort) and 17.1% (uncovered categories) compared to the competitive method. By transforming experience into a governable learning asset, DxEvolve supports an accountable pathway for the continual evolution of clinical AI.
Executive Summary
The article presents DxEvolve, a novel self-evolving diagnostic AI framework that addresses the misalignment between current AI systems and the dynamic nature of clinical diagnosis. Unlike conventional models that treat diagnosis as a static, retrospective prediction, DxEvolve incorporates an interactive deep clinical research workflow that autonomously requests examinations and integrates accumulated clinical experience as diagnostic primitives. Empirical results on the MIMIC-CDM benchmark demonstrate a 11.2% average improvement in diagnostic accuracy over baseline models, reaching 90.4% on a reader-study subset—approaching clinician-level performance (88.8%). Moreover, DxEvolve sustains gains on external cohorts, with 10.2% and 17.1% improvements in covered and uncovered categories, respectively. The framework’s capacity to transform clinical experience into a governable, auditable learning asset represents a significant advancement toward accountable, evolving clinical AI.
Key Points
- ▸ DxEvolve introduces self-evolving capabilities via interactive clinical research workflow
- ▸ Achieves significant accuracy gains over baseline models (11.2%) and comparable to clinician reference (90.4%)
- ▸ Demonstrates sustained improvements on external cohorts, indicating generalizability
Merits
Innovative Framework
DxEvolve uniquely integrates self-evolution with clinical experience accumulation, offering a more biologically aligned diagnostic model than static AI systems.
Empirical Validation
Strong benchmark results and external cohort validation substantiate the efficacy and scalability of the approach.
Demerits
Scalability Concerns
The reliance on continuous external examination requisition may raise practical barriers in real-world clinical settings with limited resource availability.
Transparency Gap
While the framework claims auditable mechanisms, specifics of the governance structure for improvement cycles remain opaque and warrant further clarification.
Expert Commentary
DxEvolve represents a pivotal shift from static to adaptive clinical AI by embedding the concept of continual evolution into the diagnostic process. The alignment between the AI’s learning mechanism and the clinician’s cognitive trajectory—particularly through the externalization of experience as primitives—is a sophisticated conceptual leap. The empirical validation on both benchmark and external cohorts is compelling, yet the article’s broader implication extends beyond performance metrics: it challenges the traditional paradigm of AI as a deploy-and-forget tool. Instead, it positions AI as a co-evolving partner in clinical decision-making. However, the article’s limitations—particularly the lack of granular detail on governance protocols and scalability constraints—suggest that practical deployment will require careful institutional planning. This work may catalyze a new wave of ‘adaptive AI’ research in medicine, but without transparent governance models, it risks replicating the same opacity issues seen in opaque deep learning systems. Future work should prioritize open-source governance architectures and measurable metrics for improvement accountability.
Recommendations
- ✓ Develop open, auditable governance protocols for self-evolving AI systems to ensure transparency and accountability.
- ✓ Conduct longitudinal studies on real-world clinical integration to assess scalability, clinician adoption, and impact on patient outcomes.