Academic

Document Optimization for Black-Box Retrieval via Reinforcement Learning

Omri Uzan, Ron Polonsky, Douwe Kiela, Christopher Potts · April 8, 2026 · 1 min read · 60 views

#cs.CL #cs.IR

arXiv:2604.05087v1 Announce Type: new Abstract: Document expansion is a classical technique for improving retrieval quality, and is attractive since it shifts computation offline, avoiding additional query-time processing. However, when applied to modern retrievers, it has been shown to degrade performance, often introducing noise that obfuscates the discriminative signal. We recast document expansion as a document optimization problem: a language model or a vision language model is fine-tuned to transform documents into representations that better align with the expected query distribution under a target retriever, using GRPO with the retriever's ranking improvements as rewards. This approach requires only black-box access to retrieval ranks, and is applicable across single-vector, multi-vector and lexical retrievers. We evaluate our approach on code retrieval and visual document retrieval (VDR) tasks. We find that learned document transformations yield retrieval gains and in many settings enable smaller, more efficient retrievers to outperform larger ones. For example, applying document optimization to OpenAI text-embedding-3-small model improves nDCG5 on code (58.7 to 66.8) and VDR (53.3 to 57.6), even slightly surpassing the 6.5X more expensive OpenAI text-embedding-3-large model (66.3 on code; 57.0 on VDR). When retriever weights are accessible, document optimization is often competitive with fine-tuning, and in most settings their combination performs best, improving Jina-ColBERT-V2 from 55.8 to 63.3 on VDR and from 48.6 to 61.8 on code retrieval.

Executive Summary

The paper advances a novel reinforcement learning-based approach to document optimization for black-box retrieval systems, transforming documents to better align with query distributions rather than relying on traditional expansion techniques. By fine-tuning language models to optimize retriever rankings using GRPO with retriever feedback as rewards, the method enhances retrieval performance across diverse retriever architectures, including single-vector, multi-vector, and lexical models. Empirical results demonstrate significant gains in code retrieval and visual document retrieval (VDR), enabling smaller models to outperform larger ones. The approach achieves state-of-the-art improvements without requiring access to retriever weights, making it broadly applicable and cost-effective.

Key Points

▸ Reformulates document expansion as an optimization problem using reinforcement learning (GRPO) to align documents with expected query distributions under a target retriever.
▸ Leverages black-box access to retriever ranks, ensuring broad applicability across different retriever architectures without requiring internal model access.
▸ Demonstrates substantial performance gains in code retrieval and VDR, enabling smaller retrievers to surpass larger, more expensive models and even compete with fine-tuning approaches.

Merits

Methodological Innovation

The paper introduces a paradigm shift by reframing document expansion as a reinforcement learning optimization problem, departing from traditional heuristic-based approaches and enabling adaptive, data-driven document transformations.

Broad Applicability

The approach is universally applicable to single-vector, multi-vector, and lexical retrievers, requiring only black-box access to retrieval ranks, which enhances its practical utility and scalability.

Empirical Robustness

The method achieves consistent performance improvements across diverse tasks (code retrieval and VDR), including scenarios where smaller models outperform larger ones, demonstrating both efficiency and effectiveness.

Demerits

Computational Overhead

The reinforcement learning fine-tuning process introduces additional computational costs during the offline optimization phase, which may limit accessibility for resource-constrained environments.

Dependency on Retriever Feedback

The approach relies on the quality and consistency of retriever feedback as rewards, which could introduce biases or suboptimal optimization if the retriever's ranking signals are noisy or inconsistent.

Task-Specific Evaluation

While evaluated on code retrieval and VDR, broader validation across additional domains (e.g., legal, medical, or multilingual retrieval) is needed to confirm generalizability and robustness.

Expert Commentary

This paper presents a significant advancement in the field of information retrieval by introducing a reinforcement learning-based approach to document optimization. The authors’ recasting of document expansion as an optimization problem addresses a critical gap in traditional retrieval techniques, which often struggle with noise and misalignment when expanding documents. The use of GRPO with retriever feedback as rewards is particularly innovative, as it leverages the retriever’s own ranking signals to guide the optimization process, ensuring that the transformations are directly aligned with the end goal of improved retrieval performance. The empirical results are compelling, demonstrating that smaller models can outperform larger ones when combined with document optimization, which has profound implications for the cost-effectiveness and scalability of retrieval systems. However, the reliance on black-box feedback introduces a dependency on the quality of the retriever’s ranking signals, which may not always be optimal or unbiased. Future work should explore the robustness of this approach across a wider range of retrieval tasks and domains, as well as the potential for integrating domain-specific knowledge into the optimization process to further enhance performance.

Recommendations

✓ Expand the evaluation to include additional domains (e.g., legal, medical, or multilingual retrieval) and larger, more diverse datasets to validate the generalizability of the approach.
✓ Investigate hybrid optimization strategies that combine reinforcement learning with supervised fine-tuning or other optimization techniques to further improve robustness and performance.
✓ Develop techniques to mitigate potential biases in retriever feedback and ensure that the document optimization process remains fair and equitable across different query distributions and document types.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

Document Optimization for Black-Box Retrieval via Reinforcement Learning

AI Commentary

Executive Summary

Key Points

Merits

Methodological Innovation

Broad Applicability

Empirical Robustness

Demerits

Computational Overhead

Dependency on Retriever Feedback

Task-Specific Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs