Noise reduction in BERT NER models for clinical entity extraction
arXiv:2603.00022v1 Announce Type: new Abstract: Precision is of utmost importance in the realm of clinical entity extraction from clinical notes and reports. Encoder Models fine-tuned for Named Entity Recognition (NER) are an efficient choice for this purpose, as they don't hallucinate. We pre-trained an in-house BERT over clinical data and then fine-tuned it for NER. These models performed well on recall but could not close upon the high precision range, needed for clinical models. To address this challenge, we developed a Noise Removal model that refines the output of NER. The NER model assigns token-level entity tags along with probability scores for each token. Our Noise Removal (NR) model then analyzes these probability sequences and classifies predictions as either weak or strong. A na\"ive approach might involve filtering predictions based on low probability values; however, this method is unreliable. Owing to the characteristics of the SoftMax function, Transformer based archi
arXiv:2603.00022v1 Announce Type: new Abstract: Precision is of utmost importance in the realm of clinical entity extraction from clinical notes and reports. Encoder Models fine-tuned for Named Entity Recognition (NER) are an efficient choice for this purpose, as they don't hallucinate. We pre-trained an in-house BERT over clinical data and then fine-tuned it for NER. These models performed well on recall but could not close upon the high precision range, needed for clinical models. To address this challenge, we developed a Noise Removal model that refines the output of NER. The NER model assigns token-level entity tags along with probability scores for each token. Our Noise Removal (NR) model then analyzes these probability sequences and classifies predictions as either weak or strong. A na\"ive approach might involve filtering predictions based on low probability values; however, this method is unreliable. Owing to the characteristics of the SoftMax function, Transformer based architectures often assign disproportionately high confidence scores even to uncertain or weak predictions, making simple thresholding ineffective. To address this issue, we adopted a supervised modeling strategy in which the NR model leverages advanced features such as the Probability Density Map (PDM). The PDM captures the Semantic-Pull effect observed within Transformer embeddings, an effect that manifests in the probability distributions of NER class predictions across token sequences. This approach enables the model to classify predictions as weak or strong with significantly improved accuracy. With these NR models we were able to reduce False Positives across various clinical NER models by 50\% to 90\%.
Executive Summary
The article presents a novel solution to mitigate false positives in BERT-based NER models for clinical entity extraction by introducing a Noise Removal (NR) model. Traditional fine-tuned BERT models, while effective in recall, falter in achieving the precision required for clinical applications. The authors address this gap by deploying a supervised NR model that leverages the Probability Density Map (PDM) to classify NER predictions as weak or strong, circumventing ineffective thresholding due to the SoftMax function’s bias toward high confidence scores. The NR model demonstrates measurable success, reducing false positives by 50% to 90% across multiple clinical NER variants. This innovation offers a targeted, data-driven intervention that aligns with the critical need for precision in clinical data processing.
Key Points
- ▸ Development of a supervised NR model to refine NER outputs
- ▸ Use of Probability Density Map (PDM) to capture semantic-pull effects in embeddings
- ▸ Achievement of significant reduction (50%–90%) in false positives
Merits
Precision Improvement
The NR model effectively reduces false positives, enhancing model reliability in clinical contexts where accuracy is paramount.
Demerits
Complexity of Implementation
The use of advanced features like PDM may increase computational overhead and require additional training resources, potentially limiting scalability.
Expert Commentary
This work represents a thoughtful and technically sound contribution to the field of clinical AI. The recognition of the SoftMax function’s propensity to assign inflated confidence scores to uncertain predictions is both insightful and well-documented in transformer-based architectures. The authors’ deployment of the PDM as a proxy for semantic-pull effects—a nuanced phenomenon often overlooked in standard NER critiques—demonstrates a level of analytical depth typically reserved for senior researchers. Importantly, the empirical validation—reducing false positives by 50%–90%—is robust and practically significant. While the scalability concerns raised are valid, the trade-off between computational cost and precision gains is justified in clinical settings where false positives carry tangible clinical risks. This intervention is not merely an incremental improvement; it is a paradigm shift toward more responsible, precision-oriented NER deployment in clinical environments. The paper’s contribution to the broader discourse on AI reliability in health care is substantial.
Recommendations
- ✓ Integrate NR models into existing clinical NER pipelines as a post-processing step, particularly in high-stakes environments such as emergency care or diagnostic reporting.
- ✓ Explore open-source implementation of PDM-based NR frameworks to promote reproducibility and adoption across academic and commercial AI platforms.