FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA), but retrieval-based approaches are computationally inefficient, prone to hallucination, and difficult to deploy over real-life EHRs. In this work, we introduce FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data. We propose a text-to-FHIRPath QA paradigm that shifts reasoning from free-text generation to FHIRPath query synthesis, significantly reducing LLM usage. Built on MIMIC-IV on FHIR Demo, the dataset pairs over 14k natural language questions in patient and clinician phrasing with validated FHIRPath queries and answers. Further, we demonstrate that state-of-the-art LLMs
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA), but retrieval-based approaches are computationally inefficient, prone to hallucination, and difficult to deploy over real-life EHRs. In this work, we introduce FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data. We propose a text-to-FHIRPath QA paradigm that shifts reasoning from free-text generation to FHIRPath query synthesis, significantly reducing LLM usage. Built on MIMIC-IV on FHIR Demo, the dataset pairs over 14k natural language questions in patient and clinician phrasing with validated FHIRPath queries and answers. Further, we demonstrate that state-of-the-art LLMs struggle to deal with ambiguity in patient language and perform poorly in FHIRPath query synthesis. However, they benefit strongly from supervised fine-tuning. Our results highlight that text-to-FHIRPath synthesis has the potential to serve as a practical foundation for safe, efficient, and interoperable consumer health applications, and our dataset and benchmark serve as a starting point for future research on the topic. The full dataset and generation code is available at: https://github.com/mooshifrew/fhirpath-qa.
Executive Summary
FHIRPath-QA: a novel approach to executable question answering over electronic health records (EHRs) is proposed, leveraging FHIRPath queries to improve precision, trustworthiness, and efficiency. A benchmark dataset and open-source code are introduced, showcasing the potential of text-to-FHIRPath synthesis for safe, efficient, and interoperable consumer health applications. While state-of-the-art language models struggle with ambiguity in patient language and FHIRPath query synthesis, they can be significantly improved through supervised fine-tuning. This work presents a promising foundation for future research and development in the field of clinical question answering.
Key Points
- ▸ FHIRPath-QA introduces a novel approach to executable question answering over EHRs using FHIRPath queries
- ▸ A benchmark dataset and open-source code are provided for text-to-FHIRPath synthesis
- ▸ State-of-the-art language models struggle with ambiguity in patient language and FHIRPath query synthesis
Merits
Strength in Addressing Ambiguity
FHIRPath-QA tackles the issue of ambiguity in patient language, which represents a significant challenge in clinical question answering, by utilizing FHIRPath queries to improve precision and trustworthiness.
Efficient and Interoperable Solution
The proposed text-to-FHIRPath synthesis paradigm has the potential to provide an efficient and interoperable foundation for consumer health applications, enabling safe and accurate access to EHRs.
Demerits
Limited Generalizability
The study's findings may not be directly applicable to diverse clinical settings and patient populations, which could limit the generalizability of the proposed approach.
Dependence on Supervised Fine-Tuning
The effectiveness of state-of-the-art language models in FHIRPath query synthesis relies heavily on supervised fine-tuning, which may not be feasible or practical in all scenarios.
Expert Commentary
The FHIRPath-QA approach represents a significant step forward in addressing the challenges associated with clinical question answering over EHRs. While the study's findings are promising, it is essential to acknowledge the limitations and consider the potential challenges in implementing this approach in diverse clinical settings. The proposed text-to-FHIRPath synthesis paradigm has the potential to provide an efficient and interoperable foundation for consumer health applications, but its effectiveness will depend on continued research and development in the field.
Recommendations
- ✓ Further research is needed to investigate the generalizability of FHIRPath-QA in diverse clinical settings and patient populations.
- ✓ Developing more advanced language models and FHIRPath synthesis techniques could enhance the efficiency and effectiveness of the proposed approach.