AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
arXiv:2603.13348v1 Announce Type: new Abstract: Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges for tool use in current RL-based scaling approaches: (a) direct RL training often struggles to scale up thinking length sufficiently to solve complex problems, and (b) scaled-up models tend to overthink simpler problems, resulting in substantial token inefficiency. To address these challenges, we propose a novel training paradigm that first employs warm-up supervised fine-tuning to help models distinguish between simple and complex problems, followed by RL that enable models to automatically determine appropriate reasoning trajectories. Furthermore, to tackle the issue of automatic thinking-length scaling, we discover that entropy-based optimization objectives effectively maintain model diversity whi
arXiv:2603.13348v1 Announce Type: new Abstract: Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges for tool use in current RL-based scaling approaches: (a) direct RL training often struggles to scale up thinking length sufficiently to solve complex problems, and (b) scaled-up models tend to overthink simpler problems, resulting in substantial token inefficiency. To address these challenges, we propose a novel training paradigm that first employs warm-up supervised fine-tuning to help models distinguish between simple and complex problems, followed by RL that enable models to automatically determine appropriate reasoning trajectories. Furthermore, to tackle the issue of automatic thinking-length scaling, we discover that entropy-based optimization objectives effectively maintain model diversity while successfully unlocking the model's scaling capabilities. Based on this insight, we introduce an entropy-based long-short reasoning fusion RL strategy. Our experiments on three benchmarks demonstrate that model successfully achieves auto-scaling for efficient tool use, achieving significant 9.8\% accuracy improvements while reducing computational overhead by \textasciitilde81\%.
Executive Summary
The article proposes AutoTool, a novel training paradigm for automatic scaling of tool-use capabilities in reinforcement learning (RL). It addresses key challenges in current RL-based scaling approaches, including struggles with scaling up thinking length and token inefficiency. The paradigm employs warm-up supervised fine-tuning and RL with entropy-based optimization objectives to achieve auto-scaling for efficient tool use, resulting in significant accuracy improvements and reduced computational overhead.
Key Points
- ▸ AutoTool addresses challenges in current RL-based scaling approaches
- ▸ Warm-up supervised fine-tuning helps models distinguish between simple and complex problems
- ▸ Entropy-based optimization objectives maintain model diversity and unlock scaling capabilities
Merits
Effective Scaling
AutoTool achieves significant accuracy improvements and reduces computational overhead
Improved Model Diversity
Entropy-based optimization objectives maintain model diversity and prevent overthinking
Demerits
Complexity
The proposed paradigm may add complexity to the training process
Limited Generalizability
The approach may not generalize well to other domains or tasks
Expert Commentary
The article presents a significant contribution to the field of reinforcement learning, addressing key challenges in scaling up tool-use capabilities. The proposed AutoTool paradigm demonstrates promising results, achieving significant accuracy improvements and reduced computational overhead. However, further research is needed to fully understand the potential and limitations of this approach, particularly in terms of generalizability and potential applications. The use of entropy-based optimization objectives is a notable innovation, and its implications for model diversity and scaling capabilities warrant further exploration.
Recommendations
- ✓ Further research on the generalizability of AutoTool to other domains and tasks
- ✓ Investigation into the potential applications of AutoTool in areas like robotics and autonomous systems