EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection
arXiv:2603.04900v1 Announce Type: new Abstract: LLM-based agents depend on effective tool-use policies to solve complex tasks, yet optimizing these policies remains challenging due to delayed supervision and the difficulty of credit assignment in long-horizon trajectories. Existing optimization approaches tend to be either monolithic, which are prone to entangling behaviors, or single-aspect, which ignore cross-module error propagation. To address these limitations, we propose EvoTool, a self-evolving framework that optimizes a modular tool-use policy via a gradient-free evolutionary paradigm. EvoTool decomposes agent's tool-use policy into four modules, including Planner, Selector, Caller, and Synthesizer, and iteratively improves them in a self-improving loop through three novel mechanisms. Trajectory-Grounded Blame Attribution uses diagnostic traces to localize failures to a specific module. Feedback-Guided Targeted Mutation then edits only that module via natural-language critique
arXiv:2603.04900v1 Announce Type: new Abstract: LLM-based agents depend on effective tool-use policies to solve complex tasks, yet optimizing these policies remains challenging due to delayed supervision and the difficulty of credit assignment in long-horizon trajectories. Existing optimization approaches tend to be either monolithic, which are prone to entangling behaviors, or single-aspect, which ignore cross-module error propagation. To address these limitations, we propose EvoTool, a self-evolving framework that optimizes a modular tool-use policy via a gradient-free evolutionary paradigm. EvoTool decomposes agent's tool-use policy into four modules, including Planner, Selector, Caller, and Synthesizer, and iteratively improves them in a self-improving loop through three novel mechanisms. Trajectory-Grounded Blame Attribution uses diagnostic traces to localize failures to a specific module. Feedback-Guided Targeted Mutation then edits only that module via natural-language critique. Diversity-Aware Population Selection preserves complementary candidates to ensure solution diversity. Across four benchmarks, EvoTool outperforms strong baselines by over 5 points on both GPT-4.1 and Qwen3-8B, while achieving superior efficiency and transferability. The code will be released once paper is accepted.
Executive Summary
EvoTool, a self-evolving framework, optimizes a modular tool-use policy in LLM agents through a gradient-free evolutionary paradigm. This approach addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories. By decomposing the agent's tool-use policy into four modules and employing novel mechanisms such as Trajectory-Grounded Blame Attribution, Feedback-Guided Targeted Mutation, and Diversity-Aware Population Selection, EvoTool achieves superior performance and efficiency compared to strong baselines. The proposed framework has the potential to improve the optimization of tool-use policies in LLM agents, enabling them to tackle complex tasks more effectively. The code will be released once the paper is accepted.
Key Points
- ▸ EvoTool optimizes a modular tool-use policy in LLM agents using a gradient-free evolutionary paradigm.
- ▸ The framework addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories.
- ▸ EvoTool employs novel mechanisms such as Trajectory-Grounded Blame Attribution, Feedback-Guided Targeted Mutation, and Diversity-Aware Population Selection.
Merits
Strength in Addressing Critical Challenges
EvoTool effectively addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories, which are critical issues in LLM agent optimization.
Superior Performance and Efficiency
EvoTool achieves superior performance and efficiency compared to strong baselines, demonstrating its potential to improve tool-use policy optimization in LLM agents.
Demerits
Limited Contextualization
The article primarily focuses on the optimization of tool-use policies in LLM agents, with limited discussion of broader contextual implications or potential applications beyond the domain of LLM agents.
Dependence on Strong Baselines
The article's performance comparisons rely on strong baselines, which might not be representative of real-world scenarios, potentially limiting the generalizability of the results.
Expert Commentary
EvoTool's innovative approach to optimizing tool-use policies in LLM agents is a welcome development in the field of AI. By addressing the challenges of delayed supervision and credit assignment, EvoTool demonstrates its potential to improve the performance and efficiency of LLM agents. However, the article's limited contextualization and dependence on strong baselines are notable limitations that should be addressed in future research. Furthermore, the article's findings have implications for the development of more effective LLM agents, which can be applied in various domains. To further validate EvoTool's performance and efficiency gains, it is essential to conduct more extensive experiments and comparisons with a broader range of baselines.
Recommendations
- ✓ Future research should investigate the application of EvoTool in real-world scenarios and its potential to improve the performance and efficiency of LLM agents in various domains.
- ✓ The article's limitations, such as limited contextualization and dependence on strong baselines, should be addressed through further research and experimentation.