SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing methods often accumulate irrelevant or noisy documents and rely on sparse reinforcement learning signals. We propose \textbf{S}elf-\textbf{E}volving \textbf{Search}, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards. SE-Search follows a \textit{Think-Search-Memorize} strategy that retains salient evidence while filtering irrelevant content. Atomic query training promotes shorter and more diverse queries, improving evidence acquisition. Dense rewards provide fine-grained feedback that speeds training. Experiments on single-hop and multi-hop question answeri
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing methods often accumulate irrelevant or noisy documents and rely on sparse reinforcement learning signals. We propose \textbf{S}elf-\textbf{E}volving \textbf{Search}, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards. SE-Search follows a \textit{Think-Search-Memorize} strategy that retains salient evidence while filtering irrelevant content. Atomic query training promotes shorter and more diverse queries, improving evidence acquisition. Dense rewards provide fine-grained feedback that speeds training. Experiments on single-hop and multi-hop question answering benchmarks show that \texttt{SE-Search-3B} outperforms strong baselines, yielding a $10.8$ point absolute improvement and a $33.8\%$ relative gain over Search-R1.\footnote{We will make the code and model weights publicly available upon acceptance.}
Executive Summary
This article proposes SE-Search, a Self-Evolving Search agent that improves online search behavior through three components: memory purification, atomic query training, and dense rewards. By retaining salient evidence and filtering irrelevant content, SE-Search outperforms strong baselines on single-hop and multi-hop question answering benchmarks, achieving a 10.8 point absolute improvement and 33.8% relative gain. The article showcases the potential of SE-Search in reducing hallucinations and factual errors in large language models, making it a valuable contribution to the field of artificial intelligence.
Key Points
- ▸ SE-Search is a Self-Evolving Search agent that improves online search behavior through memory purification, atomic query training, and dense rewards.
- ▸ The agent follows a Think-Search-Memorize strategy that retains salient evidence and filters irrelevant content.
- ▸ Experiments on question answering benchmarks show that SE-Search outperforms strong baselines, achieving a significant improvement in accuracy.
Merits
Improved Accuracy
SE-Search achieves a 10.8 point absolute improvement and 33.8% relative gain over Search-R1 on question answering benchmarks, showcasing its potential in reducing hallucinations and factual errors in large language models.
Efficient Training
The dense rewards component provides fine-grained feedback that speeds up training, making SE-Search a more efficient and effective search agent.
Demerits
Limited Evaluation
The article only evaluates SE-Search on question answering benchmarks, which may not generalize to other applications or domains.
Dependence on Data Quality
The performance of SE-Search may be sensitive to the quality of the training data, which can impact its effectiveness in real-world scenarios.
Expert Commentary
The article proposes a novel approach to improving online search behavior through a Self-Evolving Search agent. The Think-Search-Memorize strategy and dense rewards component are innovative and effective solutions to the challenges of search agents. However, the article could benefit from a more comprehensive evaluation of SE-Search, including its performance on diverse benchmarks and datasets. Additionally, the article raises important questions about the accountability and transparency of search agents, which should be addressed in future research.
Recommendations
- ✓ Future research should evaluate SE-Search on a broader range of benchmarks and datasets to assess its generalizability and robustness.
- ✓ The development of SE-Search highlights the need for more transparent and accountable search agents, which should be a focus of future research in the field of artificial intelligence.