RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks
arXiv:2603.02368v1 Announce Type: new Abstract: We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and li
arXiv:2603.02368v1 Announce Type: new Abstract: We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and lightweight deployment.
Executive Summary
The article introduces RO-N3WS, a diverse Romanian speech dataset designed to enhance generalization in automatic speech recognition (ASR) systems, particularly in low-resource conditions. The dataset comprises 126 hours of transcribed audio from various domains, enabling robust training and fine-tuning. The results show that fine-tuning on RO-N3WS yields substantial improvements in word error rate (WER) over zero-shot baselines, demonstrating the dataset's effectiveness in improving ASR performance.
Key Points
- ▸ Introduction of RO-N3WS, a diverse Romanian speech dataset
- ▸ Evaluation of state-of-the-art ASR systems using RO-N3WS
- ▸ Substantial WER improvements with limited fine-tuning on RO-N3WS
Merits
Diverse Dataset
The dataset's diversity enables robust training and fine-tuning across stylistically distinct domains, improving ASR performance
Reproducibility
The release of models, scripts, and data splits supports reproducible research in multilingual ASR and domain adaptation
Demerits
Limited Scope
The dataset is specific to the Romanian language, limiting its applicability to other languages
Dependence on Fine-Tuning
The substantial WER improvements require fine-tuning on RO-N3WS, which may not be feasible in all scenarios
Expert Commentary
The introduction of RO-N3WS is a significant contribution to the field of ASR, particularly in low-resource conditions. The dataset's diversity and the substantial WER improvements demonstrated in the article highlight the importance of robust training and fine-tuning in ASR systems. However, the limited scope of the dataset and the dependence on fine-tuning are notable limitations. Further research is needed to explore the applicability of RO-N3WS to other languages and to develop more efficient fine-tuning methods.
Recommendations
- ✓ Explore the applicability of RO-N3WS to other languages
- ✓ Develop more efficient fine-tuning methods to reduce the dependence on extensive fine-tuning