Academic

Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language

arXiv:2602.23940v1 Announce Type: new Abstract: Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, remains relatively underexplored. This study benchmarks multilingual, Indic, Hindi, and Nepali BERT variants to evaluate their effectiveness in Nepali topic classification. Ten pre-trained models, including mBERT, XLM-R, MuRIL, DevBERT, HindiBERT, IndicBERT, and NepBERTa, were fine-tuned and tested on the balanced Nepali dataset containing 25,006 sentences across five conceptual domains and the performance was evaluated using accuracy, weighted precision, recall, F1-score, and AUROC metrics. The results reveal that Indic models, particularly MuRIL-large, achieved the highest F1-score of 90.60%, outperforming multilingual and monolingual models. NepBERTa also performed competitively with an F1-score of 88.26%. Overall, these findings establish a ro

arXiv:2602.23940v1 Announce Type: new Abstract: Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, remains relatively underexplored. This study benchmarks multilingual, Indic, Hindi, and Nepali BERT variants to evaluate their effectiveness in Nepali topic classification. Ten pre-trained models, including mBERT, XLM-R, MuRIL, DevBERT, HindiBERT, IndicBERT, and NepBERTa, were fine-tuned and tested on the balanced Nepali dataset containing 25,006 sentences across five conceptual domains and the performance was evaluated using accuracy, weighted precision, recall, F1-score, and AUROC metrics. The results reveal that Indic models, particularly MuRIL-large, achieved the highest F1-score of 90.60%, outperforming multilingual and monolingual models. NepBERTa also performed competitively with an F1-score of 88.26%. Overall, these findings establish a robust baseline for future document-level classification and broader Nepali NLP applications.

Executive Summary

This study benchmarks BERT-based models for sentence-level topic classification in the Nepali language, a low-resource language with limited NLP research. The authors fine-tune and test ten pre-trained models, including multilingual and monolingual variants, on a balanced Nepali dataset containing 25,006 sentences across five conceptual domains. The results show that Indic models, particularly MuRIL-large, achieve the highest F1-score of 90.60%, outperforming other models. The study establishes a robust baseline for future document-level classification and broader Nepali NLP applications. The findings have significant implications for language development and access to information in Nepal, where many people rely on the Nepali language for communication and education. Overall, the study contributes to the advancement of Nepali NLP and highlights the potential of Indic models for low-resource languages.

Key Points

  • The study benchmarks BERT-based models for sentence-level topic classification in Nepali
  • Indic models, particularly MuRIL-large, achieve the highest F1-score of 90.60%
  • The study establishes a robust baseline for future document-level classification and broader Nepali NLP applications

Merits

Strength in Indic models

The study highlights the effectiveness of Indic models, particularly MuRIL-large, which achieve the highest F1-score of 90.60%, demonstrating their potential for low-resource languages.

Contribution to Nepali NLP

The study contributes to the advancement of Nepali NLP, establishing a robust baseline for future document-level classification and broader Nepali NLP applications.

Demerits

Limited scope

The study is limited to sentence-level topic classification and may not generalize to other NLP tasks or applications.

Dependence on pre-trained models

The study relies on pre-trained models, which may not be suitable for all Nepali NLP applications or may require significant adaptation and fine-tuning.

Expert Commentary

This study provides a valuable contribution to the field of Nepali NLP, highlighting the potential of Indic models for low-resource languages. However, the study's limitations, such as its dependence on pre-trained models and limited scope, should not be overlooked. Future research should aim to address these limitations and explore the applicability of Indic models to other NLP tasks and applications. Furthermore, the study's findings have significant policy implications, highlighting the need for investment in Indic models and low-resource language NLP. Overall, the study demonstrates the potential of BERT-based models for Nepali NLP and sets a robust baseline for future research.

Recommendations

  • Future research should aim to explore the applicability of Indic models to other NLP tasks and applications.
  • Investment in Indic models and low-resource language NLP is necessary to address the significant policy implications highlighted by the study.

Sources