Academic

Optimizing Language Models for Crosslingual Knowledge Consistency

arXiv:2603.04678v1 Announce Type: new Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optimization (DCO), a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself. Comprehensive experiments show that DCO significantly improves crosslingual consistency across diverse LLMs and outperforms existing methods when training with samples of multiple languages, while complementing DPO when gold labels are available. Extra experiments demonstrate the effectiveness of DCO in bilingual settings, signi

arXiv:2603.04678v1 Announce Type: new Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optimization (DCO), a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself. Comprehensive experiments show that DCO significantly improves crosslingual consistency across diverse LLMs and outperforms existing methods when training with samples of multiple languages, while complementing DPO when gold labels are available. Extra experiments demonstrate the effectiveness of DCO in bilingual settings, significant out-of-domain generalizability, and controllable alignment via direction hyperparameters. Taken together, these results establish DCO as a robust and efficient solution for improving knowledge consistency across languages in multilingual LLMs. All code, training scripts, and evaluation benchmarks are released at https://github.com/Betswish/ConsistencyRL.

Executive Summary

This article introduces Direct Consistency Optimization (DCO), a novel method for improving crosslingual knowledge consistency in large language models. DCO utilizes reinforcement learning with a structured reward function, derived directly from the language model itself, to achieve consistent responses across languages. Experimental results demonstrate the effectiveness of DCO in enhancing crosslingual consistency, outperforming existing methods, and showcasing significant out-of-domain generalizability.

Key Points

  • Introduction of Direct Consistency Optimization (DCO) for crosslingual knowledge consistency
  • DCO's ability to derive reward functions directly from the language model
  • Comprehensive experiments demonstrating DCO's effectiveness across diverse language models and settings

Merits

Improved Crosslingual Consistency

DCO significantly enhances crosslingual consistency, ensuring reliable responses across languages

Efficient and Robust

DCO is a robust and efficient solution, requiring no explicit reward model and outperforming existing methods

Demerits

Limited Evaluation

The article's evaluation is primarily focused on bilingual settings, with limited exploration of multilingual scenarios

Dependence on Language Model Quality

DCO's effectiveness may be dependent on the quality and accuracy of the underlying language model

Expert Commentary

The introduction of DCO represents a significant advancement in the field of natural language processing, particularly in the context of multilingual language models. By leveraging reinforcement learning and deriving reward functions directly from the language model, DCO offers a robust and efficient solution for improving crosslingual consistency. However, further research is necessary to fully explore the potential of DCO and address the limitations of the current evaluation. Nevertheless, the implications of DCO are far-reaching, with potential applications in a wide range of fields, from translation and question-answering systems to language policy and standards development.

Recommendations

  • Further evaluation of DCO in multilingual scenarios, beyond bilingual settings
  • Investigation of the potential applications and implications of DCO in real-world language model deployments

Sources