Academic

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

arXiv:2602.12921v1 Announce Type: new Abstract: Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process, that captures its semantic, syntactic, cultural, and religious dimensions, providing a rich, structured resource for computational linguistics. To establish a robust benchmark for Bangla figurative language understanding, we evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning. Our results reveal a critical performance gap, with no model surpassing 50% accuracy, a stark contrast to significantly higher human performance (83.4%). This underscores the limitations of existing models in cross-lin

Adib Sakhawat, Shamim Ara Parveen, Md Ruhul Amin, Shamim Al Mahmud, Md Saiful Islam, Tahera Khatun · March 7, 2026 · 1 min read · 16 views

#cs.CL

Executive Summary

The article 'When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms' addresses the challenge of figurative language understanding in low-resource languages, particularly Bengali. The authors introduce a comprehensive dataset of 10,361 Bengali idioms, annotated under a 19-field schema that captures semantic, syntactic, cultural, and religious dimensions. Evaluating 30 state-of-the-art multilingual and instruction-tuned LLMs, the study reveals a significant performance gap, with no model exceeding 50% accuracy in inferring figurative meaning, compared to human performance of 83.4%. This highlights the limitations of current models in cross-linguistic and cultural reasoning. The dataset and benchmark provide a foundational resource for advancing figurative language understanding in low-resource languages.

Key Points

▸ Introduction of a large-scale, culturally-grounded corpus of 10,361 Bengali idioms.
▸ Comprehensive 19-field annotation schema capturing semantic, syntactic, cultural, and religious dimensions.
▸ Evaluation of 30 state-of-the-art multilingual and instruction-tuned LLMs on figurative language understanding.
▸ Significant performance gap identified, with no model surpassing 50% accuracy.
▸ Human performance benchmark set at 83.4%, underscoring the limitations of current models.

Merits

Comprehensive Dataset

The introduction of a large-scale, annotated dataset of Bengali idioms is a significant contribution to the field of computational linguistics, particularly for low-resource languages.

Robust Benchmark

The establishment of a robust benchmark for evaluating figurative language understanding provides a valuable resource for future research and development in multilingual LLMs.

Cultural Grounding

The inclusion of cultural and religious dimensions in the annotation schema enhances the cultural grounding of the dataset, making it more representative and useful for real-world applications.

Demerits

Limited Model Performance

The study reveals a significant performance gap in current LLMs, with no model surpassing 50% accuracy, indicating a need for further advancements in cross-linguistic and cultural reasoning.

Human Performance Benchmark

While the human performance benchmark is valuable, it may not fully account for the complexity and variability of human interpretation, which could limit the comparability of the results.

Dataset Scope

The dataset is focused solely on Bengali idioms, which may limit its applicability to other low-resource languages and dialects.

Expert Commentary

The article 'When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms' presents a rigorous and well-structured analysis of the challenges in figurative language understanding for low-resource languages. The introduction of a comprehensive dataset of Bengali idioms, annotated under a detailed schema, is a significant contribution to the field. The evaluation of state-of-the-art LLMs reveals a critical performance gap, underscoring the limitations of current models in cross-linguistic and cultural reasoning. This study not only provides a valuable resource for researchers but also highlights the need for further advancements in the development of multilingual LLMs. The cultural grounding of the dataset is particularly noteworthy, as it enhances the relevance and applicability of the research. However, the limited performance of current models and the focus on a single language may present challenges for broader applicability. Overall, this study offers a foundational framework for advancing figurative language understanding in low-resource languages and sets a benchmark for future research in this area.

Recommendations

✓ Expand the dataset to include idioms from other low-resource languages to enhance its applicability and broaden the scope of research.
✓ Develop advanced techniques for cross-linguistic and cultural reasoning to improve the performance of LLMs in understanding figurative language.

Sources

arXiv - cs.CL

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Dataset

Robust Benchmark

Cultural Grounding

Demerits

Limited Model Performance

Human Performance Benchmark

Dataset Scope

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs