Left Behind: Cross-Lingual Transfer as a Bridge for Low-Resource Languages in Large Language Models
arXiv:2603.21036v1 Announce Type: new Abstract: We investigate how large language models perform on low-resource languages by benchmarking eight LLMs across five experimental conditions in English, Kazakh, and Mongolian. Using 50 hand-crafted questions spanning factual, reasoning, technical, and culturally grounded categories, we evaluate 2,000 responses on accuracy, fluency, and completeness. We find a consistent performance gap of 13.8-16.7 percentage points between English and low-resource language conditions, with models maintaining surface-level fluency while producing significantly less accurate content. Cross-lingual transfer-prompting models to reason in English before translating back-yields selective gains for bilingual architectures (+2.2pp to +4.3pp) but provides no benefit to English-dominant models. Our results demonstrate that current LLMs systematically underserve low-resource language communities, and that effective mitigation strategies are architecture-dependent rat
arXiv:2603.21036v1 Announce Type: new Abstract: We investigate how large language models perform on low-resource languages by benchmarking eight LLMs across five experimental conditions in English, Kazakh, and Mongolian. Using 50 hand-crafted questions spanning factual, reasoning, technical, and culturally grounded categories, we evaluate 2,000 responses on accuracy, fluency, and completeness. We find a consistent performance gap of 13.8-16.7 percentage points between English and low-resource language conditions, with models maintaining surface-level fluency while producing significantly less accurate content. Cross-lingual transfer-prompting models to reason in English before translating back-yields selective gains for bilingual architectures (+2.2pp to +4.3pp) but provides no benefit to English-dominant models. Our results demonstrate that current LLMs systematically underserve low-resource language communities, and that effective mitigation strategies are architecture-dependent rather than universal.
Executive Summary
This article examines the performance of large language models (LLMs) on low-resource languages, specifically Kazakh and Mongolian, and highlights the significant performance gap between these languages and English. The researchers employ a comprehensive evaluation framework, consisting of 50 hand-crafted questions across various categories, to assess the accuracy, fluency, and completeness of LLM responses. The study reveals that cross-lingual transfer strategies, which prompt models to reason in English before translating, yield selective gains for bilingual architectures but fail to benefit English-dominant models. The findings underscore the need for more effective mitigation strategies to address the systematic underservice of low-resource language communities by current LLMs.
Key Points
- ▸ LLMs exhibit a consistent performance gap of 13.8-16.7 percentage points between English and low-resource language conditions.
- ▸ Cross-lingual transfer strategies provide selective gains for bilingual architectures but offer no benefit to English-dominant models.
- ▸ Current LLMs systematically underserve low-resource language communities, highlighting the need for more effective mitigation strategies.
Merits
Comprehensive evaluation framework
The study employs a thorough evaluation framework, consisting of 50 hand-crafted questions across various categories, to assess the accuracy, fluency, and completeness of LLM responses.
In-depth analysis of cross-lingual transfer strategies
The researchers conduct a detailed examination of cross-lingual transfer strategies, including their selective gains for bilingual architectures and limitations for English-dominant models.
Demerits
Limited generalizability
The study's findings may not be generalizable to other low-resource languages or language models, limiting the broader applicability of the research.
Methodological reliance on manual question crafting
The study's reliance on manually crafted questions may introduce bias and limit the scalability of the evaluation framework.
Expert Commentary
This study provides a timely and thought-provoking examination of the performance of LLMs on low-resource languages. The researchers' comprehensive evaluation framework and in-depth analysis of cross-lingual transfer strategies yield valuable insights into the limitations of current LLMs. However, the study's methodological reliance on manual question crafting and limited generalizability to other languages or models may limit the broader applicability of the research. Nevertheless, the study's findings highlight the pressing need for more effective mitigation strategies to address the systematic underservice of low-resource language communities, which is a critical concern in the context of language model bias and fairness.
Recommendations
- ✓ Develop more effective language models that can serve low-resource language communities by incorporating diverse training data and evaluation frameworks.
- ✓ Prioritize the development of mitigation strategies that address the systematic underservice of low-resource language communities, such as bilingual architectures and culturally sensitive training data.
Sources
Original: arXiv - cs.CL