Academic

Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters

arXiv:2604.06193v1 Announce Type: new Abstract: Depression is underdiagnosed in primary care, yet timely identification remains critical. Recorded clinical encounters, increasingly common with digital scribing technologies, present an opportunity to detect depression from naturalistic dialogue. We investigated automated depression detection from 1,108 audio-recorded primary care encounters in the Establishing Focus study, with depression defined by PHQ-9 (n=253 depressed, n=855 non-depressed). We compared three supervised approaches, Sentence-BERT + Logistic Regression (LR), LIWC+LR and ModernBERT, against a zero-shot GPT-OSS. GPT-OSS achieved the strongest performance (AUPRC=0.510, AUROC=0.774), with LIWC+LR competitive among supervised models (AUPRC=0.500, AUROC=0.742). Combined dyadic transcripts outperformed single-speaker configurations, with providers linguistically mirroring patients in depression encounters, an additive signal not captured by either speaker alone. Meaningful d

arXiv:2604.06193v1 Announce Type: new Abstract: Depression is underdiagnosed in primary care, yet timely identification remains critical. Recorded clinical encounters, increasingly common with digital scribing technologies, present an opportunity to detect depression from naturalistic dialogue. We investigated automated depression detection from 1,108 audio-recorded primary care encounters in the Establishing Focus study, with depression defined by PHQ-9 (n=253 depressed, n=855 non-depressed). We compared three supervised approaches, Sentence-BERT + Logistic Regression (LR), LIWC+LR and ModernBERT, against a zero-shot GPT-OSS. GPT-OSS achieved the strongest performance (AUPRC=0.510, AUROC=0.774), with LIWC+LR competitive among supervised models (AUPRC=0.500, AUROC=0.742). Combined dyadic transcripts outperformed single-speaker configurations, with providers linguistically mirroring patients in depression encounters, an additive signal not captured by either speaker alone. Meaningful detection is achievable from the first 128 patient tokens (AUPRC=0.356, AUROC=0.675), supporting in-the-moment clinical decision support. These findings argue for passively collected clinical audio as a low-burden complement to existing screening workflows.

Executive Summary

This article explores automated depression detection using linguistic signals from audio-recorded primary care encounters, a novel application of digital scribing technologies. Analyzing 1,108 encounters, the study compares supervised models (Sentence-BERT, LIWC, ModernBERT) against a zero-shot GPT-OSS model, defining depression via PHQ-9. GPT-OSS demonstrated superior performance (AUPRC=0.510, AUROC=0.774), with LIWC+LR as the strongest supervised contender. Critically, dyadic transcripts outperformed single-speaker analysis, highlighting linguistic mirroring as a key signal. Early detection potential from just 128 patient tokens supports real-time clinical decision support, positioning passive audio analysis as a promising, low-burden adjunct to existing depression screening protocols in primary care.

Key Points

  • Automated depression detection from naturalistic primary care audio encounters is feasible and offers a low-burden screening complement.
  • GPT-OSS (zero-shot) outperformed supervised models, suggesting the power of large language models for this task.
  • Dyadic (patient-provider) linguistic analysis is superior to single-speaker analysis, revealing 'linguistic mirroring' as a significant signal for depression.
  • Meaningful detection is possible from early patient speech (first 128 tokens), supporting real-time clinical decision support.
  • The study leverages routine clinical data (audio recordings) increasingly common with digital scribing, enhancing ecological validity.

Merits

Ecological Validity

Utilizes naturalistic, routine primary care encounter audio, reflecting real-world clinical settings.

Novelty of Dyadic Analysis

Demonstrates the significant, additive value of analyzing patient-provider linguistic interactions, especially 'mirroring'.

Real-time Potential

Highlights the feasibility of early detection from limited linguistic input, enabling in-the-moment clinical support.

Comparative Rigor

Compares multiple state-of-the-art supervised and zero-shot NLP models, providing a robust performance benchmark.

Low-Burden Approach

Proposes a passive data collection method that integrates seamlessly into existing workflows without additional patient or provider effort.

Demerits

Modest Performance Metrics

While promising, AUPRC (0.510) and AUROC (0.774) indicate room for improvement for clinical utility, particularly AUPRC which is sensitive to class imbalance.

Generalizability Concerns

Study conducted within a single cohort ('Establishing Focus'), potentially limiting external validity across diverse primary care populations or healthcare systems.

Black Box Nature of LLMs

Reliance on GPT-OSS, while effective, introduces interpretability challenges regarding the specific linguistic features driving its predictions.

Ethical and Privacy Considerations

Deployment of such technology raises significant questions regarding patient consent, data security, and potential for algorithmic bias in sensitive health data.

Lack of Clinical Outcome Data

The study focuses on detection; it does not evaluate the impact of this detection on patient management, treatment initiation, or clinical outcomes.

Expert Commentary

This study offers a compelling glimpse into the future of mental health screening, leveraging the burgeoning field of conversational AI and the increasing ubiquity of digital clinical documentation. The finding that dyadic linguistic signals, particularly 'mirroring,' are more predictive than individual speaker analysis is profoundly insightful, underscoring the relational dynamics inherent in depression presentation. While the AUPRC of 0.510 suggests that this is an assistive tool rather than a standalone diagnostic, its potential for 'in-the-moment' clinical decision support is transformative. The ethical imperative surrounding consent, data governance, and algorithmic bias, especially across diverse linguistic and cultural contexts, will be paramount. Future work must rigorously address these socio-technical challenges to ensure equitable and responsible deployment, alongside clinical validation studies demonstrating improved patient outcomes, moving beyond mere detection to meaningful intervention.

Recommendations

  • Conduct large-scale, multi-site validation studies across diverse patient populations and primary care settings to assess generalizability and identify potential biases.
  • Explore methods to enhance model interpretability, particularly for LLMs, to provide clinicians with actionable insights beyond a simple risk score.
  • Develop robust ethical frameworks, clear consent protocols, and privacy-preserving techniques to address the sensitive nature of passively collected health data.
  • Investigate the integration of this technology into existing EHRs and clinical decision support systems, focusing on user-friendliness and workflow optimization for providers.
  • Perform clinical utility trials to evaluate the impact of this detection method on subsequent clinical actions, treatment initiation, and ultimately, patient mental health outcomes.

Sources

Original: arXiv - cs.CL