Academic

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference

arXiv:2602.23681v1 Announce Type: new Abstract: The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for example, fixed best-of-N or self-consistency) that is costly, hard to attribute, and can trigger overthinking with diminishing returns. We propose ODAR-Expert, an adaptive routing framework that optimizes the accuracy-efficiency trade-off via principled resource allocation. ODAR uses a difficulty estimator grounded in amortized active inference to dynamically route queries between a heuristic Fast Agent and a deliberative Slow Agent. We further introduce a free-energy-principled, risk-sensitive fusion mechanism that selects answers by minimizing a variational free energy objective, balancing log-likelihood with epistemic uncertainty (varentropy) as a principled alternative to ad hoc voting over heterogeneous candidates. Extensive evaluation acros

arXiv:2602.23681v1 Announce Type: new Abstract: The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for example, fixed best-of-N or self-consistency) that is costly, hard to attribute, and can trigger overthinking with diminishing returns. We propose ODAR-Expert, an adaptive routing framework that optimizes the accuracy-efficiency trade-off via principled resource allocation. ODAR uses a difficulty estimator grounded in amortized active inference to dynamically route queries between a heuristic Fast Agent and a deliberative Slow Agent. We further introduce a free-energy-principled, risk-sensitive fusion mechanism that selects answers by minimizing a variational free energy objective, balancing log-likelihood with epistemic uncertainty (varentropy) as a principled alternative to ad hoc voting over heterogeneous candidates. Extensive evaluation across 23 benchmarks shows strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE), while improving the compute-accuracy frontier under compute-matched settings. We also validate reproducibility on a fully open-source stack (Llama 4 + DeepSeek), where ODAR surpasses homogeneous sampling strategies while reducing computational costs by 82%. Overall, our results suggest that thinking-optimal scaling requires adaptive resource allocation with free-energy-based decision-making rather than simply increasing test-time compute.

Executive Summary

This article proposes a novel adaptive routing framework, ODAR-Expert, to optimize the accuracy-efficiency trade-off in large language model (LLM) reasoning. By leveraging a difficulty estimator grounded in amortized active inference, ODAR dynamically routes queries between a heuristic Fast Agent and a deliberative Slow Agent, balancing log-likelihood with epistemic uncertainty. The authors demonstrate strong and consistent gains across 23 benchmarks, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE), while improving the compute-accuracy frontier under compute-matched settings. This study highlights the potential of adaptive resource allocation and free-energy-based decision-making in achieving optimal scaling in LLM reasoning.

Key Points

  • ODAR-Expert is a principled adaptive routing framework for LLM reasoning that optimizes the accuracy-efficiency trade-off via resource allocation.
  • The framework uses a difficulty estimator grounded in amortized active inference to dynamically route queries between a Fast Agent and a Slow Agent.
  • The authors propose a free-energy-principled, risk-sensitive fusion mechanism for selecting answers, balancing log-likelihood with epistemic uncertainty.

Merits

Strength

The study demonstrates strong and consistent gains across 23 benchmarks, showcasing the effectiveness of ODAR-Expert in achieving optimal scaling in LLM reasoning.

Originality

The proposed framework is novel and distinct from existing approaches, which rely on uniform brute-force sampling or other heuristics.

Methodological soundness

The study employs a rigorous methodology, incorporating a free-energy-principled, risk-sensitive fusion mechanism and a difficulty estimator grounded in amortized active inference.

Demerits

Limitation

The study assumes that the difficulty estimator and the fusion mechanism are accurate and reliable, which may not be the case in all scenarios.

Scalability

The proposed framework may not be scalable to very large language models or complex tasks, which may require additional modifications or adaptations.

Expert Commentary

The article presents a significant contribution to the field of large language model reasoning, proposing a novel adaptive routing framework that optimizes the accuracy-efficiency trade-off via principled resource allocation. The study's results are impressive, demonstrating strong and consistent gains across 23 benchmarks, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE). However, the study's assumptions about the difficulty estimator and the fusion mechanism may not hold in all scenarios, and the scalability of the proposed framework to very large language models or complex tasks is uncertain. Nevertheless, the study's findings have significant practical and policy implications, and its results will likely influence the development and deployment of LLMs in various applications.

Recommendations

  • Future studies should investigate the applicability of ODAR-Expert to very large language models or complex tasks, and explore modifications or adaptations to improve scalability.
  • Researchers should also examine the limitations of the difficulty estimator and the fusion mechanism, and develop more accurate and reliable methods for evaluating difficulty and making decisions.

Sources