Academic

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

Yun Wang, Xuansheng Wu, Jingyuan Huang, Lei Liu, Xiaoming Zhai, Ninghao Liu · March 3, 2026 · 1 min read · 0 views

#cs.CL #cs.AI

arXiv:2602.23580v1 Announce Type: new Abstract: In the field of educational assessment, automated scoring systems increasingly rely on deep learning and large language models (LLMs). However, these systems face significant risks of bias amplification, where model prediction gaps between student groups become larger than those observed in training data. This issue is especially severe for underrepresented groups such as English Language Learners (ELLs), as models may inherit and further magnify existing disparities in the data. We identify that this issue is closely tied to representation bias: the scarcity of minority (high-scoring ELL) samples makes models trained with empirical risk minimization favor majority (non-ELL) linguistic patterns. Consequently, models tend to under-predict ELL students who even demonstrate comparable domain knowledge but use different linguistic patterns, thereby undermining the fairness of automated scoring outcomes. To mitigate this, we propose BRIDGE, a Bias-Reducing Inter-group Data GEneration framework designed for low-resource assessment settings. Instead of relying on the limited minority samples, BRIDGE synthesizes high-scoring ELL samples by "pasting" construct-relevant (i.e., rubric-aligned knowledge and evidence) content from abundant high-scoring non-ELL samples into authentic ELL linguistic patterns. We further introduce a discriminator model to ensure the quality of synthetic samples. Experiments on California Science Test (CAST) datasets demonstrate that BRIDGE effectively reduces prediction bias for high-scoring ELL students while maintaining overall scoring performance. Notably, our method achieves fairness gains comparable to using additional real human data, offering a cost-effective solution for ensuring equitable scoring in large-scale assessments.

Executive Summary

The article proposes a novel approach, BRIDGE, to mitigate bias amplification in automated scoring systems for English Language Learners (ELLs). By generating synthetic high-scoring ELL samples through inter-group data augmentation, BRIDGE reduces prediction bias while maintaining overall scoring performance. The framework synthesizes construct-relevant content from high-scoring non-ELL samples into authentic ELL linguistic patterns, ensuring fairness in large-scale assessments.

Key Points

▸ Bias amplification in automated scoring systems affects underrepresented groups like ELLs
▸ BRIDGE framework generates synthetic high-scoring ELL samples through inter-group data augmentation
▸ Experiments on California Science Test datasets demonstrate reduced prediction bias and maintained scoring performance

Merits

Effective Bias Reduction

BRIDGE successfully reduces prediction bias for high-scoring ELL students

Cost-Effective Solution

The method achieves fairness gains comparable to using additional real human data, offering a cost-effective solution

Demerits

Limited Generalizability

The framework's effectiveness may be limited to specific assessment settings and datasets

Expert Commentary

The proposed BRIDGE framework offers a promising solution to mitigate bias amplification in automated scoring systems. By leveraging inter-group data augmentation, BRIDGE addresses the scarcity of minority samples and promotes fairness in large-scale assessments. However, further research is needed to ensure the framework's generalizability and applicability to diverse assessment settings. The article's findings have significant implications for promoting equity in educational assessments and highlight the importance of considering fairness in AI-driven decision-making systems.

Recommendations

✓ Further research on the generalizability of the BRIDGE framework to diverse assessment settings
✓ Exploration of the framework's applicability to other underrepresented groups beyond ELLs

Sources

arXiv - cs.CL

Something extraordinary is coming.

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

AI Commentary

Executive Summary

Key Points

Merits

Effective Bias Reduction

Cost-Effective Solution

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering

JCG, PC

HSOLLC Co., Ltd.