Academic

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

arXiv:2602.16843v1 Announce Type: new Abstract: Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on reference summaries. We introduce BanglaSummEval, a reference-free, question-answering-based framework for evaluating factual consistency in Bangla summarization. The proposed method assesses both factual accuracy and content coverage through automatically generated questions and answers derived from the source document and the summary. A single multilingual instruction-tuned language model handles question generation, question answering, candidate answer extraction, and question importance weighting. This unified design reduces system complexity and computational cost. To capture semantic consistency beyond surface-level overlap, we use BERTScore-Recall for answer comparison

Ahmed Rafid, Rumman Adib, Fariya Ahmed, Ajwad Abrar, Mohammed Saidul Islam · February 22, 2026 · 1 min read · 5 views

#cs.CL

Executive Summary

The article introduces BanglaSummEval, a reference-free, question-answering-based framework for evaluating factual consistency in Bangla summarization. This framework assesses both factual accuracy and content coverage through automatically generated questions and answers derived from the source document and the summary. The proposed method demonstrates strong correlation with expert human judgments and offers a practical and transparent solution for factual consistency evaluation in low-resource language settings. However, the reliance on a single multilingual language model may limit its scalability and adaptability to different domains and languages.

Key Points

▸ BanglaSummEval is a reference-free, question-answering-based framework for evaluating factual consistency in Bangla summarization.
▸ The framework assesses both factual accuracy and content coverage through automatically generated questions and answers.
▸ The proposed method demonstrates strong correlation with expert human judgments (Pearson's r = 0.694, Spearman's ρ = 0.763).

Merits

Strength in Factual Consistency Evaluation

BanglaSummEval's reference-free design allows for the evaluation of factual consistency in Bangla summarization without relying on human-annotated reference summaries, making it a valuable tool for low-resource language settings.

Scalability and Adaptability

The use of a single multilingual instruction-tuned language model simplifies system complexity and reduces computational cost, making it easier to deploy and maintain.

Demerits

Limitation in Scalability

The reliance on a single multilingual language model may limit the framework's ability to adapt to different domains and languages, potentially reducing its effectiveness in certain contexts.

Potential for Bias

The use of a pre-trained language model may introduce biases and inaccuracies, particularly if the model is not fine-tuned for the specific language or domain being evaluated.

Expert Commentary

BanglaSummEval is a significant contribution to the field of natural language processing, particularly in the area of factual consistency evaluation in low-resource languages. While the framework demonstrates strong correlation with expert human judgments, its reliance on a single multilingual language model may limit its scalability and adaptability. Nevertheless, the proposed method offers a practical and transparent solution for factual consistency evaluation in low-resource language settings. As the demand for summarization systems continues to grow, the development of evaluation metrics like BanglaSummEval will become increasingly important for ensuring the accuracy and reliability of these systems.

Recommendations

✓ Future research should focus on fine-tuning the language model for specific languages and domains to improve the framework's adaptability and effectiveness.
✓ The development of additional reference-free evaluation metrics should be explored to provide a more comprehensive understanding of summarization quality and accuracy.

Sources

arXiv - cs.CL

Something extraordinary is coming.

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

AI Commentary

Executive Summary

Key Points

Merits

Strength in Factual Consistency Evaluation

Scalability and Adaptability

Demerits

Limitation in Scalability

Potential for Bias

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.