APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM search capabilities. Specifically, we introduce a two-stage agentic framework that decouples the retrieval process into planning and execution: It first employs RL with decomposition-specific rewards to optimize strategic planning; Built on the sub-task decomposition, it then applies supervised fine-tuning on high-quality multi-hop trajectories to equip the model with robust iterative sub-task execution capabilities. Extensive experiments demonstrate that our proposed framework achieves significant improvements in both multi-hop RAG and task planning performances across multiple benchmarks.
Executive Summary
The article introduces APEX-Searcher, a novel framework designed to enhance LLMs' search capabilities by introducing an agentic planning and execution architecture. Recognizing the limitations of conventional retrieval-augmented generation (RAG) systems—particularly their inability to adequately address complex multi-hop queries with single-round retrieval—the authors propose a two-stage agentic framework that decouples retrieval into planning and execution phases. The first stage employs reinforcement learning with decomposition-specific rewards to optimize strategic planning, while the second stage leverages supervised fine-tuning on high-quality multi-hop trajectories to improve iterative sub-task execution. Experimental results indicate measurable improvements in both multi-hop RAG and task planning across multiple benchmarks. This represents a meaningful advancement in addressing the challenges of ambiguous retrieval paths and sparse RL rewards in LLM-based search systems.
Key Points
- ▸ Introduction of a two-stage agentic framework decoupling retrieval into planning and execution
- ▸ Use of RL with decomposition-specific rewards for strategic planning optimization
- ▸ Application of supervised fine-tuning on high-quality multi-hop trajectories to enhance iterative execution
Merits
Innovative Framework Design
APEX-Searcher introduces a structured, decoupled approach to retrieval that addresses specific shortcomings in current RAG systems by separating planning from execution, offering a more modular and targeted solution.
Empirical Validation
The framework’s efficacy is substantiated through extensive experimental validation across multiple benchmarks, demonstrating tangible performance gains in both multi-hop RAG and task planning.
Demerits
Implementation Complexity
The dual-stage architecture may introduce increased complexity in deployment and integration, particularly for practitioners accustomed to end-to-end models without modular intermediaries.
Scalability Concerns
While effective on benchmark datasets, the reliance on high-quality pre-annotated multi-hop trajectories for fine-tuning may limit scalability in real-world applications where such annotated data is scarce or costly to produce.
Expert Commentary
APEX-Searcher represents a significant conceptual leap in the evolution of retrieval-augmented generation systems. By recognizing the inherent limitations of end-to-end RL in handling ambiguous retrieval paths and sparse reward structures, the authors pivot toward a more principled, modular architecture that aligns with cognitive modeling principles—planning before execution. This mirrors classical AI planning paradigms and introduces a level of interpretability and controllability that is often absent in opaque, end-to-end LLM systems. Moreover, the use of decomposition-specific rewards in the planning phase aligns with recent advances in hierarchical reinforcement learning and task decomposition, suggesting a convergence of academic theory and applied engineering. The fine-tuning stage, while data-intensive, represents a pragmatic compromise between data scarcity and model performance—leveraging existing high-quality trajectories as a proxy for robustness. Importantly, this framework may pave the way for hybrid architectures that combine agentic control with end-to-end learning, potentially enabling adaptive, context-aware retrieval systems that evolve with user needs. This work bridges a critical gap between theoretical rigor and practical applicability in the field of LLM-based search.
Recommendations
- ✓ Adopt APEX-Searcher in enterprise and academic LLM deployments where multi-hop reasoning is critical, particularly in legal, scientific, or technical domains.
- ✓ Invest in annotated multi-hop trajectory datasets or develop semi-automated labeling tools to mitigate scalability concerns and enable broader adoption of agentic planning frameworks.