Academic

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

arXiv:2602.21320v1 Announce Type: new Abstract: Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch with self-play RL, under a zero-data assumption. Initialized from the same base LLM, Tool-R0 co-evolves a Generator and a Solver with complementary rewards: one proposes targeted challenging tasks at the other's competence frontier and the other learns to solve them with real-world tool calls. This creates a self-evolving cycle that requires no pre-existing tasks or

Emre Can Acikgoz, Cheng Qian, Jonas H\"ubotter, Heng Ji, Dilek Hakkani-T\"ur, Gokhan Tur · February 27, 2026 · 1 min read · 3 views

#cs.LG

Executive Summary

The article 'Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data' proposes a novel framework for training large language models (LLMs) to use tools and solve complex tasks with self-play reinforcement learning (RL). The Tool-R0 framework co-evolves a Generator and a Solver, allowing the model to learn from scratch without pre-existing tasks or datasets. The authors demonstrate significant improvements in tool-use benchmarks, surpassing fully supervised tool-calling baselines. This breakthrough has far-reaching implications for autonomous agents and artificial intelligence, particularly in domains where human supervision is scarce or impractical. The article provides valuable insights into self-play LLM agents and their potential applications.

Key Points

▸ Tool-R0 framework enables self-evolving LLM agents to learn tool-use from scratch with self-play RL
▸ Co-evolution of Generator and Solver allows for targeted challenging tasks and real-world tool calls
▸ Significant improvements in tool-use benchmarks and surpassing of fully supervised tool-calling baselines

Merits

Strength

The article addresses a critical limitation of previous RL approaches, which often rely on carefully constructed task-solution pairs and human supervision.

Demerits

Limitation

The authors do not explore the potential risks and challenges associated with self-evolving LLM agents, such as the development of unpredictable or undesirable behaviors.

Expert Commentary

The Tool-R0 framework is a significant contribution to the field of artificial intelligence, as it addresses a long-standing challenge in RL research. However, the authors should be commended for acknowledging the potential risks and limitations of their approach. As we continue to develop more advanced AI systems, it is essential to carefully consider the potential consequences and implications of such technologies. The Tool-R0 framework has the potential to revolutionize various domains, but it also requires careful evaluation and regulation to ensure that its benefits are realized while minimizing its risks.

Recommendations

✓ Future research should focus on exploring the potential risks and challenges associated with self-evolving LLM agents and developing strategies to mitigate them.
✓ The development of the Tool-R0 framework should be accompanied by a thorough evaluation of its potential applications, benefits, and limitations, as well as a careful consideration of the policy and regulatory implications.

Sources

arXiv - cs.LG

Something extraordinary is coming.

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

AI Commentary

Executive Summary

Key Points

Merits

Strength

Demerits

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.