Academic

Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam

arXiv:2603.04454v1 Announce Type: new Abstract: How carefully and unambiguously a question is phrased has a profound impact on the quality of the response, for Language Models (LMs) as well as people. While model capabilities continue to advance, the interplay between grounding context and query formulation remains under-explored. This work investigates how the quality of background grounding information in a model's context window affects accuracy. We find that combining well-grounded dynamic context construction (i.e, RAG) with query rewriting reduces question ambiguity, resulting in significant accuracy gains. Given a user question with associated answer-free grounding context, rewriting the question to reduce ambiguity produces benchmark improvements without changing the answer itself, even compared to prepending that context before the question. Using \texttt{gpt-oss-20b} to rewrite a subset of Humanity's Last Exam using answer-free grounding context improves \texttt{gpt-5-mini}

M
Michael Majurski, Cynthia Matuszek
· · 1 min read · 2 views

arXiv:2603.04454v1 Announce Type: new Abstract: How carefully and unambiguously a question is phrased has a profound impact on the quality of the response, for Language Models (LMs) as well as people. While model capabilities continue to advance, the interplay between grounding context and query formulation remains under-explored. This work investigates how the quality of background grounding information in a model's context window affects accuracy. We find that combining well-grounded dynamic context construction (i.e, RAG) with query rewriting reduces question ambiguity, resulting in significant accuracy gains. Given a user question with associated answer-free grounding context, rewriting the question to reduce ambiguity produces benchmark improvements without changing the answer itself, even compared to prepending that context before the question. Using \texttt{gpt-oss-20b} to rewrite a subset of Humanity's Last Exam using answer-free grounding context improves \texttt{gpt-5-mini} accuracy from 0.14 to 0.37. We demonstrate that this accuracy improvement cannot be fully recovered just through prompting at inference time; rather, distinct rewriting and answering phases are required. Code and data are available at https://github.com/mmajurski/lm-rewrite-uplift

Executive Summary

This article presents a novel approach to query disambiguation using answer-free context to improve the performance of language models. The authors propose combining dynamic context construction with query rewriting to reduce question ambiguity, resulting in significant accuracy gains. The method is demonstrated on Humanity's Last Exam, showing a substantial improvement in accuracy when using the exttt{gpt-oss-20b} model to rewrite questions. The findings highlight the importance of context grounding in query formulation and suggest that distinct rewriting and answering phases are necessary to achieve optimal performance. The study's code and data are publicly available, making it a valuable contribution to the field of language modeling.

Key Points

  • Query disambiguation is crucial for improving the quality of responses from language models.
  • Combining dynamic context construction and query rewriting reduces question ambiguity and improves accuracy.
  • Answer-free context is an effective tool for query disambiguation, outperforming prepending context before the question.

Merits

Strength in Methodology

The authors' use of Humanity's Last Exam as a benchmark dataset adds credibility to their findings and allows for direct comparison with existing models. The public availability of the study's code and data facilitates reproducibility and encourages further research.

Significant Accuracy Gains

The reported accuracy improvement from 0.14 to 0.37 using the exttt{gpt-5-mini} model is substantial and demonstrates the effectiveness of the proposed approach.

Demerits

Limitation in Generalizability

The study's focus on a specific dataset and model may limit the generalizability of the findings to other contexts and language models. Further research is needed to explore the applicability of the proposed approach.

Lack of Theoretical Foundation

The article does not provide a detailed theoretical explanation of why answer-free context is effective for query disambiguation. A deeper understanding of the underlying mechanisms could enhance the study's contributions.

Expert Commentary

While the article presents a promising approach to query disambiguation, the study's limitations in generalizability and lack of theoretical foundation suggest that further research is necessary to fully understand the mechanisms underlying the proposed method. Nevertheless, the findings demonstrate the potential of answer-free context in improving language model performance, and the public availability of the study's code and data makes it a valuable contribution to the field. As the study's implications are primarily practical, the proposed approach can be applied in a variety of NLP tasks to improve accuracy and efficiency. However, the policy implications of the study are more speculative, and further research is needed to fully explore the potential impact on language model development and deployment.

Recommendations

  • Future studies should investigate the generalizability of the proposed approach to other datasets, language models, and NLP tasks.
  • A deeper theoretical understanding of why answer-free context is effective for query disambiguation is necessary to enhance the study's contributions and facilitate the development of more effective language models.

Sources