Skip to main content
Academic

Budget-Aware Agentic Routing via Boundary-Guided Training

arXiv:2602.21227v1 Announce Type: cross Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a sequential, path-dependent problem: early mistakes compound, feedback is often at the end of the episode, and deployments often demand strict per-task spending limits. We propose Budget-Aware Agentic Routing, which selects between a cheap and an expensive model at each step to optimize the cost--success frontier and to operate under strict per-task budgets. We propose Boundary-Guided Training, which leverages two boundary policies (always-small vs.\ always-large) to build a difficulty taxonomy and to anchor learning under sparse rewards. Our approach warms start with boundary-guided SFT data synthesis via stratified sampling of cost-efficient trajectories, then applies Boundary-G

arXiv:2602.21227v1 Announce Type: cross Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a sequential, path-dependent problem: early mistakes compound, feedback is often at the end of the episode, and deployments often demand strict per-task spending limits. We propose Budget-Aware Agentic Routing, which selects between a cheap and an expensive model at each step to optimize the cost--success frontier and to operate under strict per-task budgets. We propose Boundary-Guided Training, which leverages two boundary policies (always-small vs.\ always-large) to build a difficulty taxonomy and to anchor learning under sparse rewards. Our approach warms start with boundary-guided SFT data synthesis via stratified sampling of cost-efficient trajectories, then applies Boundary-Guided Policy Optimization (BoPO), combining boundary-relative rewards with a reference-guided advantage to avoid degenerate cheap-failure solutions. Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost while demonstrating generalization to strict inference-time budget constraints. Overall, our work establishes a foundational framework for agentic routing, shifting the paradigm from static model selection to dynamic, budget-aware sequential decision-making.

Executive Summary

This article proposes Budget-Aware Agentic Routing, a novel approach to optimizing the cost-sucess frontier for sequential, path-dependent problems in large language models. By selecting between a cheap and an expensive model at each step, the method aims to operate under strict per-task budgets while improving efficiency. The authors introduce Boundary-Guided Training, which leverages two boundary policies to build a difficulty taxonomy and anchor learning under sparse rewards. Experiment results demonstrate improved efficiency and generalization to strict inference-time budget constraints, establishing a foundational framework for agentic routing. The method shifts the paradigm from static model selection to dynamic, budget-aware sequential decision-making.

Key Points

  • Proposes Budget-Aware Agentic Routing to optimize the cost-sucess frontier for sequential problems
  • Introduces Boundary-Guided Training to build a difficulty taxonomy and anchor learning under sparse rewards
  • Demonstrates improved efficiency and generalization to strict inference-time budget constraints

Merits

Strength in Dynamic Decision-Making

The proposed method enables dynamic, budget-aware sequential decision-making, which is essential for agentic routing in large language models.

Improved Efficiency

Budget-Aware Agentic Routing improves the efficiency frontier, matching strong routing baselines at substantially lower cost.

Generalization to Strict Budget Constraints

The method demonstrates generalization to strict inference-time budget constraints, making it a valuable approach in real-world applications.

Demerits

Limited Exploration of Alternative Methods

The article primarily focuses on Budget-Aware Agentic Routing and does not extensively explore alternative methods or their comparisons.

Assumes Sufficient Data for Boundary Policy Training

The method relies on sufficient data for boundary policy training, which may not always be available in real-world applications.

Expert Commentary

The article presents a novel and promising approach to agentic routing in large language models. The proposed Budget-Aware Agentic Routing method and Boundary-Guided Training technique demonstrate improved efficiency and generalization to strict inference-time budget constraints. However, the method's reliance on sufficient data for boundary policy training and limited exploration of alternative methods are notable limitations. To further enhance the method's performance and adaptability, incorporating domain knowledge and transfer learning techniques could be explored. The proposed approach has significant implications for both practical and policy-making applications, making it a valuable contribution to the field.

Recommendations

  • Future research should investigate the application of Budget-Aware Agentic Routing to various real-world scenarios and evaluate its performance in comparison to alternative methods.
  • The authors should explore incorporating domain knowledge and transfer learning techniques to improve the method's performance and adaptability.

Sources