Academic

Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

arXiv:2602.15378v1 Announce Type: new Abstract: Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over 2 million speakers but minimal digital presence. Rather than fine-tuning an LLM, we examine whether structured prompts alone can elicit basic conversational ability under controlled prompting. We systematically tackle various challenges posed by absence of training data for Tulu by combining explicit grammar documentation, negative constraints to suppress high-probability tokens from related languages, romanization standardization, and quality-controlled synthetic data generation via self-play. Evaluated on a manually curated held-out set across three LLMs (Gemini 2.0 Flash, GPT-4o, Llama 3.1 70B) and validated by native speakers, our approach reduces vocabulary contamination from 80% to 5% while achieving 85% grammatical accuracy. Cross-model analysis reveals

Prathamesh Devadiga, Paras Chopra · February 23, 2026 · 1 min read · 3 views

#cs.CL

Executive Summary

This article presents a novel approach to enabling large language models (LLMs) to converse in low-resource languages like Tulu. By using structured prompts, the authors demonstrate significant improvement in vocabulary contamination and grammatical accuracy, reaching 85% across three different LLMs. The study highlights the effectiveness of negative constraints and grammar documentation in tackling the challenges posed by the absence of training data. This research has significant implications for the development of language technologies in resource-constrained environments. The findings suggest that structured prompting can be a viable alternative to fine-tuning LLMs, opening up new avenues for language preservation and development. The study's results are validated by native speakers, providing a robust assessment of the approach's efficacy.

Key Points

▸ Structured prompting can enable LLMs to converse in low-resource languages like Tulu.
▸ Negative constraints and grammar documentation are effective in reducing vocabulary contamination and improving grammatical accuracy.
▸ The approach achieves significant results across three different LLMs, with 85% grammatical accuracy and reduced vocabulary contamination from 80% to 5%.

Merits

Strength in Addressing Low-Resource Languages

The study's innovative approach addresses a significant challenge in natural language processing, enabling LLMs to converse in languages with minimal digital presence.

Robust Validation by Native Speakers

The validation of the approach by native speakers provides a robust assessment of its efficacy, ensuring that the results are meaningful and applicable in real-world scenarios.

Demerits

Limited Generalizability to Other Languages

The study's findings may not be directly generalizable to other languages, as the specific challenges and characteristics of Tulu may not be representative of other low-resource languages.

Dependence on High-Quality Training Data

The approach relies on high-quality grammar documentation and synthetic data generation, which may not be readily available for all languages, limiting the scalability of the approach.

Expert Commentary

The study's innovative approach to enabling LLMs to converse in low-resource languages like Tulu is a significant contribution to the field of natural language processing. By leveraging structured prompts, negative constraints, and grammar documentation, the authors demonstrate a compelling alternative to fine-tuning LLMs. The study's findings have far-reaching implications for language preservation and development, particularly in resource-constrained environments. However, the approach's limitations, such as dependence on high-quality training data and limited generalizability, must be carefully considered. Future research should focus on addressing these challenges and exploring the scalability of the approach to other languages.

Recommendations

✓ Investigate the applicability of the structured prompting approach to other low-resource languages and evaluate its effectiveness in real-world scenarios.
✓ Develop and refine the grammar documentation and synthetic data generation processes to improve the quality and availability of training data for low-resource languages.

Sources

arXiv - cs.CL

Something extraordinary is coming.

Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Low-Resource Languages

Robust Validation by Native Speakers

Demerits

Limited Generalizability to Other Languages

Dependence on High-Quality Training Data

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.