Academic

Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei, Atena Farangi, AmirBahador Boroumand · March 1, 2026 · 1 min read · 4 views

#cs.CL #cs.AI #cs.LG

arXiv:2602.21374v1 Announce Type: cross Abstract: Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.

Executive Summary

This study evaluates a two-step pipeline for extracting clinical information from medical transcripts in Persian, a low-resource language. The pipeline combines a Persian-to-English translation model with five small language models, achieving promising results for binary extraction of 13 clinical features. The best-performing model, Qwen2.5-7B-Instruct, achieved a median macro-F1 score of 0.899 and a Matthews Correlation Coefficient of 0.797. The study highlights the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.

Key Points

▸ The study proposes a two-step pipeline for clinical information extraction in low-resource languages
▸ The pipeline combines a Persian-to-English translation model with small language models
▸ The best-performing model achieved a median macro-F1 score of 0.899 and a Matthews Correlation Coefficient of 0.797

Merits

Effective Use of Small Language Models

The study demonstrates the potential of small language models for clinical information extraction in low-resource languages, which can be particularly useful in settings with limited infrastructure and annotation resources.

Improvement in Sensitivity and Robustness

The use of a bilingual analysis with Aya-expanse-8B translation model improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance.

Demerits

Class Imbalance and Limited Feature Extraction

The study acknowledges the challenge of class imbalance and the limited extraction of certain clinical features, such as psychological complaints and complex somatic features.

Trade-off between Sensitivity and Specificity

The use of the translation model improved sensitivity but slightly reduced specificity and precision, highlighting the need for further optimization.

Expert Commentary

This study makes a significant contribution to the field of healthcare NLP, demonstrating the potential of small language models for clinical information extraction in low-resource languages. The use of a two-step pipeline and the evaluation of multiple models provide valuable insights into the optimization of model performance. However, the study also highlights the challenges of class imbalance and the limited extraction of certain clinical features, which require further research and development. Overall, the study provides a comprehensive and well-reasoned approach to addressing the complexities of clinical information extraction in multilingual settings.

Recommendations

✓ Further research is needed to address class imbalance and improve the extraction of challenging clinical features, such as psychological complaints and complex somatic features.
✓ The development of more inclusive and equitable healthcare systems requires the investment in healthcare NLP research, particularly in low-resource languages, to address healthcare disparities and improve health outcomes.

Sources

arXiv - cs.AI

Something extraordinary is coming.

Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

AI Commentary

Executive Summary

Key Points

Merits

Effective Use of Small Language Models

Improvement in Sensitivity and Robustness

Demerits

Class Imbalance and Limited Feature Extraction

Trade-off between Sensitivity and Specificity

Expert Commentary

Recommendations

Sources

Related Articles

Budget-Aware Agentic Routing via Boundary-Guided Training

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision …

Urban Vibrancy Embedding and Application on Traffic Prediction

JCG, PC

HSOLLC Co., Ltd.