An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
arXiv:2602.20324v1 Announce Type: new Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Us
arXiv:2602.20324v1 Announce Type: new Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Using clinician-curated HPO terms as the gold standard, RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline (PhenoBERT) across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation (i.e., ontology-based similarity of 0.70 vs. 0.58). Ablation analyses demonstrated performance improvements with the addition of each module in RARE-PHENIX (extraction, standardization, and prioritization), supporting the value of modeling the full clinical phenotyping workflow. By modeling phenotyping as a clinically aligned workflow rather than a single extraction task, RARE-PHENIX provides structured, ranked phenotypes that are more concordant with clinician curation and has the potential to support human-in-the-loop rare disease diagnosis in real-world settings.
Executive Summary
This article describes the development of RARE-PHENIX, an artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models. The framework integrates three key components: phenotype extraction, ontology-grounded standardization to Human Phenotype Ontology (HPO) terms, and supervised ranking of diagnostically informative phenotypes. RARE-PHENIX was trained on a large dataset of patients with rare diseases and externally validated on real-world clinical notes from Vanderbilt University Medical Center. The results show that RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline across various metrics, highlighting the potential of the framework in supporting human-in-the-loop rare disease diagnosis. The development of RARE-PHENIX addresses a significant challenge in rare disease diagnosis and has the potential to improve patient outcomes.
Key Points
- ▸ RARE-PHENIX is an AI framework for end-to-end rare disease phenotyping from clinical notes.
- ▸ The framework integrates three key components: extraction, standardization, and prioritization.
- ▸ RARE-PHENIX was trained on a large dataset of patients with rare diseases and externally validated on real-world clinical notes.
Merits
Strength in Integrated Approach
RARE-PHENIX's integrated approach to phenotyping, which models the full clinical workflow, demonstrates significant improvement over a single extraction task, highlighting the value of a comprehensive framework.
Demerits
Limitation in Generalizability
The study's results may not be generalizable to all rare diseases and clinical settings, as the framework was trained on a specific dataset and tested on real-world notes from a single institution.
Expert Commentary
The development of RARE-PHENIX represents a significant advancement in rare disease diagnosis, leveraging large language models to extract and standardize phenotypes from clinical notes. The framework's integrated approach to phenotyping, which models the full clinical workflow, is a key strength. However, the study's limitations, such as potential generalizability issues, require careful consideration. Furthermore, the article's focus on the application of NLP in healthcare underscores the broader importance of this field in supporting clinical decision-making. As AI-driven healthcare solutions continue to evolve, it is essential to prioritize integration into clinical workflows and to address the challenges associated with scalability and generalizability.
Recommendations
- ✓ Future studies should investigate the generalizability of RARE-PHENIX to diverse rare diseases and clinical settings.
- ✓ The development of RARE-PHENIX highlights the need for further investment in AI-driven healthcare solutions and the integration of AI into clinical workflows.