Academic

Huntington Disease Automatic Speech Recognition with Biomarker Supervision

arXiv:2603.11168v1 Announce Type: new Abstract: Automatic speech recognition (ASR) for pathological speech remains underexplored, especially for Huntington's disease (HD), where irregular timing, unstable phonation, and articulatory distortion challenge current models. We present a systematic HD-ASR study using a high-fidelity clinical speech corpus not previously used for end-to-end ASR training. We compare multiple ASR families under a unified evaluation, analyzing WER as well as substitution, deletion, and insertion patterns. HD speech induces architecture-specific error regimes, with Parakeet-TDT outperforming encoder-decoder and CTC baselines. HD-specific adaptation reduces WER from 6.99% to 4.95% and we also propose a method for using biomarker-based auxiliary supervision and analyze how error behavior is reshaped in severity-dependent ways rather than uniformly improving WER. We open-source all code and models.

C
Charles L. Wang, Cady Chen, Ziwei Gong, Julia Hirschberg
· · 1 min read · 14 views

arXiv:2603.11168v1 Announce Type: new Abstract: Automatic speech recognition (ASR) for pathological speech remains underexplored, especially for Huntington's disease (HD), where irregular timing, unstable phonation, and articulatory distortion challenge current models. We present a systematic HD-ASR study using a high-fidelity clinical speech corpus not previously used for end-to-end ASR training. We compare multiple ASR families under a unified evaluation, analyzing WER as well as substitution, deletion, and insertion patterns. HD speech induces architecture-specific error regimes, with Parakeet-TDT outperforming encoder-decoder and CTC baselines. HD-specific adaptation reduces WER from 6.99% to 4.95% and we also propose a method for using biomarker-based auxiliary supervision and analyze how error behavior is reshaped in severity-dependent ways rather than uniformly improving WER. We open-source all code and models.

Executive Summary

This article presents a comprehensive study on automatic speech recognition (ASR) for Huntington's disease (HD) patients, utilizing a high-fidelity clinical speech corpus. The researchers compare multiple ASR models, analyzing word error rates and error patterns, and propose a novel approach using biomarker-based auxiliary supervision. The results show that HD-specific adaptation can significantly reduce word error rates, and the error behavior is reshaped in severity-dependent ways. The study contributes to the development of more accurate ASR systems for pathological speech, with potential applications in clinical diagnosis and treatment.

Key Points

  • HD speech poses significant challenges to current ASR models due to irregular timing, unstable phonation, and articulatory distortion
  • The study compares multiple ASR families, including Parakeet-TDT, encoder-decoder, and CTC baselines
  • HD-specific adaptation and biomarker-based auxiliary supervision can improve ASR performance and provide insights into error behavior

Merits

Comprehensive evaluation

The study provides a thorough comparison of multiple ASR models and analyzes various error patterns, offering a detailed understanding of the strengths and weaknesses of each approach.

Novel approach

The proposal of biomarker-based auxiliary supervision introduces a new perspective on improving ASR performance for HD speech, with potential applications in other pathological speech recognition tasks.

Demerits

Limited dataset

The study relies on a single clinical speech corpus, which may not be representative of the entire HD patient population, potentially limiting the generalizability of the findings.

Expert Commentary

The study's use of biomarker-based auxiliary supervision is a notable contribution to the field of ASR for pathological speech. By incorporating biomarker information, the researchers can provide a more nuanced understanding of the relationships between speech patterns and disease severity. However, further research is needed to fully explore the potential of this approach and to address the limitations of the current study, such as the reliance on a single dataset. The findings of this study have significant implications for the development of more accurate ASR systems for HD patients and other neurological disorders.

Recommendations

  • Future studies should investigate the use of biomarker-based auxiliary supervision in other pathological speech recognition tasks
  • The development of more diverse and representative datasets is crucial for improving the generalizability and accuracy of ASR systems for HD patients

Sources