Academic

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

arXiv:2603.16184v1 Announce Type: new Abstract: We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B exclusively on publicly available speech corpora, using a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning so that the model learns to identify languages implicitly from audio. On 12 benchmarks spanning the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR (14.32) - a model 6x larger - while incurring a training cost of \$81 on a single RTX PRO 6000 GPU compared to \$18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at 0.10 s/sample versus 2.02 s/sample. These results de

Quy-Anh Dang, Chris Ngo · March 18, 2026 · 1 min read · 2 views

#cs.CL

Executive Summary

This article presents Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore. By fine-tuning Qwen3-ASR models exclusively on publicly available speech corpora using a balanced sampling strategy, the researchers achieve competitive results with a significant reduction in training cost and inference time. The average error rate of 14.85 is comparable to that of a larger model, MERaLiON-2-10B-ASR, while requiring a mere 1/256 of the training cost. This breakthrough has significant implications for the development of deployment-ready multilingual ASR systems, making it an important contribution to the field of natural language processing.

Key Points

▸ Polyglot-Lion is a compact multilingual ASR model tailored for the linguistic landscape of Singapore.
▸ The model achieves competitive results through fine-tuning of Qwen3-ASR models using a balanced sampling strategy.
▸ The training cost and inference time are significantly reduced compared to larger specialist systems.

Merits

Significant Reduction in Training Cost

The authors achieve a training cost of $81 on a single GPU, which is 1/256 of the cost of the 128-GPU baseline, making it a cost-effective solution for deployment-ready multilingual ASR systems.

Improved Inference Throughput

The inference throughput of Polyglot-Lion is approximately 20x faster than MERaLiON-2-10B-ASR, making it suitable for real-time applications.

Competitive Results

The average error rate of 14.85 is comparable to that of a larger model, MERaLiON-2-10B-ASR, demonstrating the effectiveness of Polyglot-Lion in multilingual ASR tasks.

Demerits

Limited Dataset Diversity

The authors only fine-tune the models on publicly available speech corpora, which may limit the diversity of the dataset and the generalizability of the results.

Language Tag Conditioning Omission

The authors deliberately omit language-tag conditioning, which may lead to potential biases in the model's performance and interpretation.

Expert Commentary

The authors' approach to fine-tuning Qwen3-ASR models using a balanced sampling strategy is a significant breakthrough in the field of multilingual ASR. By achieving competitive results with a significant reduction in training cost and inference time, Polyglot-Lion has the potential to revolutionize the development of deployment-ready multilingual ASR systems. The omission of language-tag conditioning is a deliberate choice that highlights the importance of exploring alternative approaches to language identification. While the limited dataset diversity is a concern, the authors' approach can be adapted to address this issue in future work.

Recommendations

✓ Future research should investigate the use of Polyglot-Lion in more diverse linguistic landscapes and evaluate its performance in real-world applications.
✓ The authors should explore alternative approaches to language identification, such as using language tag conditioning or multimodal inputs, to improve the model's robustness and generalizability.

Sources

arXiv - cs.CL

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

AI Commentary

Executive Summary

Key Points

Merits

Significant Reduction in Training Cost

Improved Inference Throughput

Competitive Results

Demerits

Limited Dataset Diversity

Language Tag Conditioning Omission

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs