Academic

EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection

arXiv:2603.04900v1 Announce Type: new Abstract: LLM-based agents depend on effective tool-use policies to solve complex tasks, yet optimizing these policies remains challenging due to delayed supervision and the difficulty of credit assignment in long-horizon trajectories. Existing optimization approaches tend to be either monolithic, which are prone to entangling behaviors, or single-aspect, which ignore cross-module error propagation. To address these limitations, we propose EvoTool, a self-evolving framework that optimizes a modular tool-use policy via a gradient-free evolutionary paradigm. EvoTool decomposes agent's tool-use policy into four modules, including Planner, Selector, Caller, and Synthesizer, and iteratively improves them in a self-improving loop through three novel mechanisms. Trajectory-Grounded Blame Attribution uses diagnostic traces to localize failures to a specific module. Feedback-Guided Targeted Mutation then edits only that module via natural-language critique

Shuo Yang, Soyeon Caren Han, Xueqi Ma, Yan Li, Mohammad Reza Ghasemi Madani, Eduard Hovy · March 7, 2026 · 1 min read · 15 views

#cs.AI

Executive Summary

EvoTool, a self-evolving framework, optimizes a modular tool-use policy in LLM agents through a gradient-free evolutionary paradigm. This approach addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories. By decomposing the agent's tool-use policy into four modules and employing novel mechanisms such as Trajectory-Grounded Blame Attribution, Feedback-Guided Targeted Mutation, and Diversity-Aware Population Selection, EvoTool achieves superior performance and efficiency compared to strong baselines. The proposed framework has the potential to improve the optimization of tool-use policies in LLM agents, enabling them to tackle complex tasks more effectively. The code will be released once the paper is accepted.

Key Points

▸ EvoTool optimizes a modular tool-use policy in LLM agents using a gradient-free evolutionary paradigm.
▸ The framework addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories.
▸ EvoTool employs novel mechanisms such as Trajectory-Grounded Blame Attribution, Feedback-Guided Targeted Mutation, and Diversity-Aware Population Selection.

Merits

Strength in Addressing Critical Challenges

EvoTool effectively addresses the challenges of delayed supervision and credit assignment in long-horizon trajectories, which are critical issues in LLM agent optimization.

Superior Performance and Efficiency

EvoTool achieves superior performance and efficiency compared to strong baselines, demonstrating its potential to improve tool-use policy optimization in LLM agents.

Demerits

Limited Contextualization

The article primarily focuses on the optimization of tool-use policies in LLM agents, with limited discussion of broader contextual implications or potential applications beyond the domain of LLM agents.

Dependence on Strong Baselines

The article's performance comparisons rely on strong baselines, which might not be representative of real-world scenarios, potentially limiting the generalizability of the results.

Expert Commentary

EvoTool's innovative approach to optimizing tool-use policies in LLM agents is a welcome development in the field of AI. By addressing the challenges of delayed supervision and credit assignment, EvoTool demonstrates its potential to improve the performance and efficiency of LLM agents. However, the article's limited contextualization and dependence on strong baselines are notable limitations that should be addressed in future research. Furthermore, the article's findings have implications for the development of more effective LLM agents, which can be applied in various domains. To further validate EvoTool's performance and efficiency gains, it is essential to conduct more extensive experiments and comparisons with a broader range of baselines.

Recommendations

✓ Future research should investigate the application of EvoTool in real-world scenarios and its potential to improve the performance and efficiency of LLM agents in various domains.
✓ The article's limitations, such as limited contextualization and dependence on strong baselines, should be addressed through further research and experimentation.

Sources

arXiv - cs.AI

EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Critical Challenges

Superior Performance and Efficiency

Demerits

Limited Contextualization

Dependence on Strong Baselines

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs