Academic

What Is Missing: Interpretable Ratings for Large Language Model Outputs

arXiv:2603.04429v1 Announce Type: new Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization and Direct Preference Optimization learn from direct rankings or numerical ratings of model outputs, these rankings are subjective, and a single numerical rating chosen directly by a judge is a poor proxy for the quality of natural language, we introduce the What Is Missing (WIM) rating system to produce rankings from natural-language feedback, WIM integrates into existing training pipelines, can be combined with other rating techniques, and can be used as input to any preference learning method without changing the learning algorithm, to compute a WIM rating, a human or LLM judge writes feedback describing what the model output is missing, we embed the output and the feedback with a sentence embedding model and compute the cosine similarity between the resulting vectors, we empirically observe that, compared to discrete numerical ratings,

N
Nicholas Stranges, Yimin Yang
· · 1 min read · 2 views

arXiv:2603.04429v1 Announce Type: new Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization and Direct Preference Optimization learn from direct rankings or numerical ratings of model outputs, these rankings are subjective, and a single numerical rating chosen directly by a judge is a poor proxy for the quality of natural language, we introduce the What Is Missing (WIM) rating system to produce rankings from natural-language feedback, WIM integrates into existing training pipelines, can be combined with other rating techniques, and can be used as input to any preference learning method without changing the learning algorithm, to compute a WIM rating, a human or LLM judge writes feedback describing what the model output is missing, we embed the output and the feedback with a sentence embedding model and compute the cosine similarity between the resulting vectors, we empirically observe that, compared to discrete numerical ratings, WIM yields fewer ties and larger rating deltas, which improves the availability of a learning signal in pairwise preference data, we use interpretable in the following limited sense: for each scalar rating, we can inspect the judge's missing-information text that produced it, enabling qualitative debugging of the preference labels.

Executive Summary

The article introduces the What Is Missing (WIM) rating system, a novel approach to evaluating Large Language Model (LLM) outputs. WIM generates rankings from natural-language feedback, providing a more nuanced and interpretable alternative to traditional numerical ratings. By leveraging sentence embedding models and cosine similarity, WIM produces fewer ties and larger rating deltas, enhancing the learning signal in pairwise preference data. This innovation enables qualitative debugging of preference labels, offering a significant improvement over existing methods.

Key Points

  • Introduction of the What Is Missing (WIM) rating system
  • WIM generates rankings from natural-language feedback
  • Improved learning signal in pairwise preference data

Merits

Enhanced Interpretability

WIM allows for qualitative debugging of preference labels, providing insight into the judge's thought process

Flexibility and Compatibility

WIM can be integrated into existing training pipelines and combined with other rating techniques

Demerits

Limited Generalizability

The effectiveness of WIM may depend on the quality and consistency of the natural-language feedback provided

Computational Overhead

The use of sentence embedding models and cosine similarity may introduce additional computational complexity

Expert Commentary

The introduction of the WIM rating system marks a significant step forward in the development of more interpretable and reliable LLMs. By providing a nuanced and transparent approach to evaluating model outputs, WIM has the potential to improve the performance and trustworthiness of AI systems. However, further research is needed to fully realize the benefits of WIM and address potential limitations, such as the quality and consistency of natural-language feedback. As the field continues to evolve, it is essential to prioritize transparency, explainability, and accountability in AI development and deployment.

Recommendations

  • Further research on the effectiveness and limitations of WIM in various applications and domains
  • Development of guidelines and standards for the use of WIM and other interpretable AI methods

Sources