Academic

EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning

Zhyar Rzgar K Rostam, G\'abor Kert\'esz · March 2, 2026 · 1 min read · 0 views

#cs.CL #cs.AI

arXiv:2602.21216v1 Announce Type: cross Abstract: The EQ-5D (EuroQol 5-Dimensions) is a standardized instrument for the evaluation of health-related quality of life. In health economics, systematic literature reviews (SLRs) depend on the correct identification of publications that use the EQ-5D, but manual screening of large volumes of scientific literature is time-consuming, error-prone, and inconsistent. In this study, we investigate fine-tuning of general-purpose (BERT) and domain-specific (SciBERT, BioBERT) pre-trained language models (PLMs), enriched with biomedical entity information extracted through scispaCy models for each statement, to improve EQ-5D detection from abstracts. We conduct nine experimental setups, including combining three scispaCy models with three PLMs, and evaluate their performance at both the sentence and study levels. Furthermore, we explore a Multiple Instance Learning (MIL) approach with attention pooling to aggregate sentence-level information into study-level predictions, where each abstract is represented as a bag of enriched sentences (by scispaCy). The findings indicate consistent improvements in F1-scores (reaching 0.82) and nearly perfect recall at the study-level, significantly exceeding classical bag-of-words baselines and recently reported PLM baselines. These results show that entity enrichment significantly improves domain adaptation and model generalization, enabling more accurate automated screening in systematic reviews.

Executive Summary

This study investigates the use of pre-trained language models (PLMs) and biomedical entity information to improve the detection of EQ-5D, a standardized instrument for evaluating health-related quality of life, from scientific abstracts. The results show that entity enrichment significantly improves domain adaptation and model generalization, enabling more accurate automated screening in systematic reviews. The study achieves consistent improvements in F1-scores and nearly perfect recall at the study-level, outperforming classical baselines and recently reported PLM baselines.

Key Points

▸ Use of pre-trained language models (PLMs) for EQ-5D detection
▸ Incorporation of biomedical entity information to improve model performance
▸ Multiple Instance Learning (MIL) approach for aggregating sentence-level information

Merits

Improved Accuracy

The study achieves high F1-scores and nearly perfect recall at the study-level, indicating improved accuracy in EQ-5D detection.

Entity Enrichment

The incorporation of biomedical entity information significantly improves domain adaptation and model generalization.

Demerits

Limited Generalizability

The study's findings may not be generalizable to other domains or tasks, and further research is needed to validate the results.

Expert Commentary

This study demonstrates the potential of pre-trained language models and biomedical entity information to improve the accuracy and efficiency of systematic literature reviews. The results have significant implications for health economics research and decision-making, and highlight the importance of continued innovation in natural language processing and machine learning methods. However, further research is needed to validate the findings and explore their generalizability to other domains and tasks.

Recommendations

✓ Further research to validate the findings and explore their generalizability
✓ Investigation of the potential applications of the study's methods in other domains and tasks

Sources

arXiv - cs.AI

Something extraordinary is coming.

EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning

AI Commentary

Executive Summary

Key Points

Merits

Improved Accuracy

Entity Enrichment

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.