Academic

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM

K
Kun Chen, Qingchao Kong, Zhao Feifei, Wenji Mao
· · 1 min read · 25 views

arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM search capabilities. Specifically, we introduce a two-stage agentic framework that decouples the retrieval process into planning and execution: It first employs RL with decomposition-specific rewards to optimize strategic planning; Built on the sub-task decomposition, it then applies supervised fine-tuning on high-quality multi-hop trajectories to equip the model with robust iterative sub-task execution capabilities. Extensive experiments demonstrate that our proposed framework achieves significant improvements in both multi-hop RAG and task planning performances across multiple benchmarks.

Executive Summary

The article introduces APEX-Searcher, a novel framework designed to enhance LLMs' search capabilities by introducing an agentic planning and execution architecture. Recognizing the limitations of conventional retrieval-augmented generation (RAG) systems—particularly their inability to adequately address complex multi-hop queries with single-round retrieval—the authors propose a two-stage agentic framework that decouples retrieval into planning and execution phases. The first stage employs reinforcement learning with decomposition-specific rewards to optimize strategic planning, while the second stage leverages supervised fine-tuning on high-quality multi-hop trajectories to improve iterative sub-task execution. Experimental results indicate measurable improvements in both multi-hop RAG and task planning across multiple benchmarks. This represents a meaningful advancement in addressing the challenges of ambiguous retrieval paths and sparse RL rewards in LLM-based search systems.

Key Points

  • Introduction of a two-stage agentic framework decoupling retrieval into planning and execution
  • Use of RL with decomposition-specific rewards for strategic planning optimization
  • Application of supervised fine-tuning on high-quality multi-hop trajectories to enhance iterative execution

Merits

Innovative Framework Design

APEX-Searcher introduces a structured, decoupled approach to retrieval that addresses specific shortcomings in current RAG systems by separating planning from execution, offering a more modular and targeted solution.

Empirical Validation

The framework’s efficacy is substantiated through extensive experimental validation across multiple benchmarks, demonstrating tangible performance gains in both multi-hop RAG and task planning.

Demerits

Implementation Complexity

The dual-stage architecture may introduce increased complexity in deployment and integration, particularly for practitioners accustomed to end-to-end models without modular intermediaries.

Scalability Concerns

While effective on benchmark datasets, the reliance on high-quality pre-annotated multi-hop trajectories for fine-tuning may limit scalability in real-world applications where such annotated data is scarce or costly to produce.

Expert Commentary

APEX-Searcher represents a significant conceptual leap in the evolution of retrieval-augmented generation systems. By recognizing the inherent limitations of end-to-end RL in handling ambiguous retrieval paths and sparse reward structures, the authors pivot toward a more principled, modular architecture that aligns with cognitive modeling principles—planning before execution. This mirrors classical AI planning paradigms and introduces a level of interpretability and controllability that is often absent in opaque, end-to-end LLM systems. Moreover, the use of decomposition-specific rewards in the planning phase aligns with recent advances in hierarchical reinforcement learning and task decomposition, suggesting a convergence of academic theory and applied engineering. The fine-tuning stage, while data-intensive, represents a pragmatic compromise between data scarcity and model performance—leveraging existing high-quality trajectories as a proxy for robustness. Importantly, this framework may pave the way for hybrid architectures that combine agentic control with end-to-end learning, potentially enabling adaptive, context-aware retrieval systems that evolve with user needs. This work bridges a critical gap between theoretical rigor and practical applicability in the field of LLM-based search.

Recommendations

  • Adopt APEX-Searcher in enterprise and academic LLM deployments where multi-hop reasoning is critical, particularly in legal, scientific, or technical domains.
  • Invest in annotated multi-hop trajectory datasets or develop semi-automated labeling tools to mitigate scalability concerns and enable broader adoption of agentic planning frameworks.

Sources