Academic

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

arXiv:2602.12921v1 Announce Type: new Abstract: Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process, that captures its semantic, syntactic, cultural, and religious dimensions, providing a rich, structured resource for computational linguistics. To establish a robust benchmark for Bangla figurative language understanding, we evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning. Our results reveal a critical performance gap, with no model surpassing 50% accuracy, a stark contrast to significantly higher human performance (83.4%). This underscores the limitations of existing models in cross-lin

arXiv:2602.12921v1 Announce Type: new Abstract: Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process, that captures its semantic, syntactic, cultural, and religious dimensions, providing a rich, structured resource for computational linguistics. To establish a robust benchmark for Bangla figurative language understanding, we evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning. Our results reveal a critical performance gap, with no model surpassing 50% accuracy, a stark contrast to significantly higher human performance (83.4%). This underscores the limitations of existing models in cross-linguistic and cultural reasoning. By releasing the new idiom dataset and benchmark, we provide foundational infrastructure for advancing figurative language understanding and cultural grounding in LLMs for Bengali and other low-resource languages.

Executive Summary

The article 'When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms' addresses the challenge of figurative language understanding in low-resource languages, particularly Bengali. The authors introduce a comprehensive dataset of 10,361 Bengali idioms, annotated under a 19-field schema that captures semantic, syntactic, cultural, and religious dimensions. Evaluating 30 state-of-the-art multilingual and instruction-tuned LLMs, the study reveals a significant performance gap, with no model exceeding 50% accuracy in inferring figurative meaning, compared to human performance of 83.4%. This highlights the limitations of current models in cross-linguistic and cultural reasoning. The dataset and benchmark provide a foundational resource for advancing figurative language understanding in low-resource languages.

Key Points

  • Introduction of a large-scale, culturally-grounded corpus of 10,361 Bengali idioms.
  • Comprehensive 19-field annotation schema capturing semantic, syntactic, cultural, and religious dimensions.
  • Evaluation of 30 state-of-the-art multilingual and instruction-tuned LLMs on figurative language understanding.
  • Significant performance gap identified, with no model surpassing 50% accuracy.
  • Human performance benchmark set at 83.4%, underscoring the limitations of current models.

Merits

Comprehensive Dataset

The introduction of a large-scale, annotated dataset of Bengali idioms is a significant contribution to the field of computational linguistics, particularly for low-resource languages.

Robust Benchmark

The establishment of a robust benchmark for evaluating figurative language understanding provides a valuable resource for future research and development in multilingual LLMs.

Cultural Grounding

The inclusion of cultural and religious dimensions in the annotation schema enhances the cultural grounding of the dataset, making it more representative and useful for real-world applications.

Demerits

Limited Model Performance

The study reveals a significant performance gap in current LLMs, with no model surpassing 50% accuracy, indicating a need for further advancements in cross-linguistic and cultural reasoning.

Human Performance Benchmark

While the human performance benchmark is valuable, it may not fully account for the complexity and variability of human interpretation, which could limit the comparability of the results.

Dataset Scope

The dataset is focused solely on Bengali idioms, which may limit its applicability to other low-resource languages and dialects.

Expert Commentary

The article 'When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms' presents a rigorous and well-structured analysis of the challenges in figurative language understanding for low-resource languages. The introduction of a comprehensive dataset of Bengali idioms, annotated under a detailed schema, is a significant contribution to the field. The evaluation of state-of-the-art LLMs reveals a critical performance gap, underscoring the limitations of current models in cross-linguistic and cultural reasoning. This study not only provides a valuable resource for researchers but also highlights the need for further advancements in the development of multilingual LLMs. The cultural grounding of the dataset is particularly noteworthy, as it enhances the relevance and applicability of the research. However, the limited performance of current models and the focus on a single language may present challenges for broader applicability. Overall, this study offers a foundational framework for advancing figurative language understanding in low-resource languages and sets a benchmark for future research in this area.

Recommendations

  • Expand the dataset to include idioms from other low-resource languages to enhance its applicability and broaden the scope of research.
  • Develop advanced techniques for cross-linguistic and cultural reasoning to improve the performance of LLMs in understanding figurative language.

Sources