Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
arXiv:2603.10351v1 Announce Type: new Abstract: Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references, particularly in low-resource languages. We attribute this bias to spurious correlations with (i) latent manifold alignment with English and (ii) cross-lingual predictability. To mitigate this bias, we propose DIBJudge, a robust fine-tuning framework that learns a minimally sufficient, judgment-critical representation via variational information compression, while explicitly isolating spurious factors into the dedicated bias branch. Furthermore, we incorporate a cross-covariance penalty that explicitly suppresses statistical dependence between robust and bias representations, thereby encouraging effective disentanglement. Extensive evaluations on multilingual rewar
arXiv:2603.10351v1 Announce Type: new Abstract: Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references, particularly in low-resource languages. We attribute this bias to spurious correlations with (i) latent manifold alignment with English and (ii) cross-lingual predictability. To mitigate this bias, we propose DIBJudge, a robust fine-tuning framework that learns a minimally sufficient, judgment-critical representation via variational information compression, while explicitly isolating spurious factors into the dedicated bias branch. Furthermore, we incorporate a cross-covariance penalty that explicitly suppresses statistical dependence between robust and bias representations, thereby encouraging effective disentanglement. Extensive evaluations on multilingual reward modeling benchmarks and a dedicated translationese bias evaluation suite demonstrate that the proposed DIBJudge consistently outperforms strong baselines and substantially mitigates translationese bias.
Executive Summary
The article proposes a novel approach, DIBJudge, to mitigate translationese bias in multilingual large language models (LLMs). Translationese bias refers to the systematic favoring of machine-translated text over human-authored references, particularly in low-resource languages. The DIBJudge framework utilizes variational information compression and a cross-covariance penalty to disentangle judgment-critical representations from spurious factors, effectively reducing translationese bias. Evaluations on multilingual benchmarks demonstrate the efficacy of DIBJudge in outperforming strong baselines and mitigating translationese bias.
Key Points
- ▸ Translationese bias in LLMs favors machine-translated text over human-authored references
- ▸ DIBJudge framework proposes a robust fine-tuning approach to mitigate translationese bias
- ▸ Variational information compression and cross-covariance penalty are used to disentangle judgment-critical representations
Merits
Effective Mitigation of Translationese Bias
DIBJudge demonstrates substantial reduction in translationese bias, improving the reliability of LLMs in multilingual evaluation
Demerits
Limited Generalizability
The proposed framework may not be directly applicable to all types of language models or evaluation tasks, requiring further adaptation and testing
Expert Commentary
The proposed DIBJudge framework offers a promising approach to addressing the pervasive issue of translationese bias in multilingual LLMs. By leveraging variational information compression and cross-covariance penalty, the authors demonstrate a significant reduction in bias, which can have far-reaching implications for the development of more reliable and fair language technologies. However, further research is necessary to fully explore the generalizability and applicability of this framework to diverse language models and evaluation tasks.
Recommendations
- ✓ Further evaluation of DIBJudge on a broader range of language models and tasks to assess its generalizability
- ✓ Investigation of the potential applications of DIBJudge in other areas, such as language translation and language understanding