Skip to main content
Academic

Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches

arXiv:2602.15869v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance on clinical de-identification, the task of identifying sensitive identifiers to protect privacy. However, previous work has not examined their generalizability between formats, cultures, and genders. In this work, we systematically evaluate fine-tuned transformer models (BERT, ClinicalBERT, ModernBERT), small LLMs (Llama 1-8B, Qwen 1.5-7B), and large LLMs (Llama-70B, Qwen-72B) at de-identification. We show that smaller models achieve comparable performance while substantially reducing inference cost, making them more practical for deployment. Moreover, we demonstrate that smaller models can be fine-tuned with limited data to outperform larger models in de-identifying identifiers drawn from Mandarin, Hindi, Spanish, French, Bengali, and regional variations of English, in addition to gendered names. To improve robustness in multi-cultural contexts, we introduce and publicly release

arXiv:2602.15869v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance on clinical de-identification, the task of identifying sensitive identifiers to protect privacy. However, previous work has not examined their generalizability between formats, cultures, and genders. In this work, we systematically evaluate fine-tuned transformer models (BERT, ClinicalBERT, ModernBERT), small LLMs (Llama 1-8B, Qwen 1.5-7B), and large LLMs (Llama-70B, Qwen-72B) at de-identification. We show that smaller models achieve comparable performance while substantially reducing inference cost, making them more practical for deployment. Moreover, we demonstrate that smaller models can be fine-tuned with limited data to outperform larger models in de-identifying identifiers drawn from Mandarin, Hindi, Spanish, French, Bengali, and regional variations of English, in addition to gendered names. To improve robustness in multi-cultural contexts, we introduce and publicly release BERT-MultiCulture-DEID, a set of de-identification models based on BERT, ClinicalBERT, and ModernBERT, fine-tuned on MIMIC with identifiers from multiple language variants. Our findings provide the first comprehensive quantification of the efficiency-generalizability trade-off in de-identification and establish practical pathways for fair and efficient clinical de-identification. Details on accessing the models are available at: https://doi.org/10.5281/zenodo.18342291

Executive Summary

This article presents a comprehensive evaluation of various large language models (LLMs) for clinical de-identification, a crucial task in protecting patient privacy. The authors demonstrate that smaller models can achieve comparable performance to larger models while significantly reducing inference cost, making them more practical for deployment. The study also introduces a new model, BERT-MultiCulture-DEID, which is fine-tuned on MIMIC with identifiers from multiple language variants, improving robustness in multi-cultural contexts. The findings provide a quantification of the efficiency-generalizability trade-off in de-identification and offer practical pathways for fair and efficient clinical de-identification. The authors release their models publicly, enabling further research and adoption. The study's results have significant implications for healthcare institutions and policymakers seeking to balance de-identification efficiency with generalizability and cultural sensitivity.

Key Points

  • Smaller LLMs achieve comparable performance to larger models while reducing inference cost.
  • BERT-MultiCulture-DEID is introduced as a robust model for multi-cultural contexts.
  • The study provides a comprehensive quantification of the efficiency-generalizability trade-off in de-identification.

Merits

Strength

The study provides a comprehensive evaluation of various LLMs for clinical de-identification, covering multiple formats, cultures, and genders. The authors demonstrate the potential of smaller models for practical deployment, and introduce a new model for improving robustness in multi-cultural contexts.

Demerits

Limitation

The study's evaluation is limited to a specific dataset (MIMIC) and models, which may not generalize to other datasets or models. Additionally, the authors do not provide a detailed analysis of the cultural and linguistic nuances that may affect de-identification performance.

Methodological Limitation

The study does not provide a clear comparison of the inference cost of smaller models versus larger models, which is crucial for practical deployment.

Expert Commentary

The study's findings are significant in the context of clinical de-identification, a critical task in protecting patient privacy. The authors' evaluation of various LLMs provides a comprehensive understanding of the efficiency-generalizability trade-off in de-identification. However, the study's limitations, such as the reliance on a specific dataset and models, highlight the need for further research. Additionally, the authors' introduction of BERT-MultiCulture-DEID demonstrates the potential of LLMs to improve robustness in multi-cultural contexts. The study's implications for healthcare institutions and policymakers are substantial, and its findings will likely influence the development of clinical de-identification models and regulations.

Recommendations

  • Future studies should evaluate the performance of LLMs on diverse datasets and models to ensure generalizability.
  • Healthcare institutions should consider adopting smaller LLMs for clinical de-identification, balancing efficiency with cultural sensitivity and generalizability.

Sources