Skip to main content
Academic

CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

arXiv:2602.17949v1 Announce Type: new Abstract: Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and supertypes. Constructing such concept sets is labour-intensive, inconsistently performed, and poorly supported by existing tools, particularly for NLP pipelines that operate directly on UMLS CUIs. Methods We present CUICurate, a Graph-based retrieval-augmented generation (GraphRAG) framework for automated UMLS concept set curation. A UMLS knowledge graph (KG) was constructed and embedded for semantic retrieval. For each target concept, candidate CUIs were retrieved from the KG, followed by large language model (LLM) filtering and classification steps comparing two LLMs (GPT-5 and GPT-5-mini). The framework was evaluated on five lexically heter

arXiv:2602.17949v1 Announce Type: new Abstract: Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and supertypes. Constructing such concept sets is labour-intensive, inconsistently performed, and poorly supported by existing tools, particularly for NLP pipelines that operate directly on UMLS CUIs. Methods We present CUICurate, a Graph-based retrieval-augmented generation (GraphRAG) framework for automated UMLS concept set curation. A UMLS knowledge graph (KG) was constructed and embedded for semantic retrieval. For each target concept, candidate CUIs were retrieved from the KG, followed by large language model (LLM) filtering and classification steps comparing two LLMs (GPT-5 and GPT-5-mini). The framework was evaluated on five lexically heterogeneous clinical concepts against a manually curated benchmark and gold-standard concept sets. Results Across all concepts, CUICurate produced substantially larger and more complete concept sets than the manual benchmarks whilst matching human precision. Comparisons between the two LLMs found that GPT-5-mini achieved higher recall during filtering, while GPT-5 produced classifications that more closely aligned with clinician judgements. Outputs were stable across repeated runs and computationally inexpensive. Conclusions CUICurate offers a scalable and reproducible approach to support UMLS concept set curation that substantially reduces manual effort. By integrating graph-based retrieval with LLM reasoning, the framework produces focused candidate concept sets that can be adapted to clinical NLP pipelines for different phenotyping and analytic requirements.

Executive Summary

This article presents CUICurate, a GraphRAG-based framework for automated clinical concept curation. The framework integrates a UMLS knowledge graph with large language models (LLMs) to retrieve and filter candidate concept sets. CUICurate demonstrates improved performance over manual benchmarks, producing larger and more complete concept sets with matching human precision. The framework's scalability and reproducibility are highlighted, with stable outputs across repeated runs and low computational costs. The implications of CUICurate for clinical NLP pipelines are significant, enabling more efficient and accurate phenotyping and analysis. However, the framework's reliance on LLMs raises concerns regarding data quality and availability. Overall, CUICurate offers a promising approach to automated clinical concept curation, but its limitations and potential biases must be carefully considered.

Key Points

  • CUICurate integrates a UMLS knowledge graph with LLMs for automated clinical concept curation
  • The framework produces larger and more complete concept sets with matching human precision
  • CUICurate is scalable, reproducible, and computationally inexpensive

Merits

Improved Accuracy

CUICurate demonstrates improved performance over manual benchmarks, with larger and more complete concept sets and matching human precision.

Scalability and Reproducibility

The framework is designed to be scalable and reproducible, with stable outputs across repeated runs and low computational costs.

Demerits

Reliance on LLMs

The framework's reliance on LLMs raises concerns regarding data quality and availability, potentially impacting the accuracy and reliability of CUICurate outputs.

Limited Domain Knowledge

The framework's limited domain knowledge and reliance on UMLS may restrict its applicability to specific clinical domains or contexts.

Expert Commentary

CUICurate represents a significant advancement in automated clinical concept curation, integrating a UMLS knowledge graph with LLMs to produce high-quality concept sets. However, the framework's reliance on LLMs raises concerns regarding data quality and availability. Furthermore, CUICurate's limited domain knowledge and reliance on UMLS may restrict its applicability to specific clinical domains or contexts. To address these limitations, further research is needed on pipeline design and implementation, UMLS knowledge graph updates and maintenance, and the development of more robust LLMs. Additionally, policy changes may be necessary to accommodate the increased demand for clinical data and analytics.

Recommendations

  • Develop and refine CUICurate for specific clinical domains or contexts, addressing its limited domain knowledge and reliance on UMLS.
  • Investigate the use of alternative LLMs or knowledge graphs to mitigate the risks associated with CUICurate's reliance on UMLS and LLMs.

Sources