Academic

Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning

arXiv:2603.21162v1 Announce Type: new Abstract: Neural tree search is a powerful decision-making algorithm widely used in complex domains such as game playing and model-based reinforcement learning. Recent work has applied AlphaZero-style tree search to enhance the reasoning capabilities of Large Language Models (LLMs) during inference, but we find that this approach suffers from a scaling failure: on GSM8K and Game24, accuracy drops as the search budget increases. In this paper, we present ReSCALE, an adaptation of Gumbel AlphaZero MCTS that replaces Dirichlet noise and PUCT selection with Gumbel sampling and Sequential Halving, restoring monotonic scaling without changes to the model or its training. ReSCALE reaches 58.4\% on GSM8K and 85.3\% on Game24 at budgets where the baseline degrades. Ablations confirm that Sequential Halving is the primary driver of the improvement.

Leonid Ugadiarov, Yuri Kuratov, Aleksandr Panov, Alexey Skrynnik · March 24, 2026 · 1 min read · 9 views

#cs.AI #cs.LG

Executive Summary

This article presents ReSCALE, an adaptation of Gumbel AlphaZero MCTS for Large Language Models (LLMs) that scales monotonically with the search budget without compromising accuracy. By replacing Dirichlet noise and PUCT selection with Gumbel sampling and Sequential Halving, ReSCALE outperforms the baseline on two challenging tasks, GSM8K and Game24. The authors demonstrate the efficacy of their approach through ablation studies, attributing the improvement primarily to Sequential Halving. The implications of this research are significant, offering a more budget-scalable and accurate reasoning framework for LLMs. This breakthrough has the potential to revolutionize the field of LLMs, enabling them to tackle increasingly complex tasks with greater efficiency and accuracy.

Key Points

▸ ReSCALE is an adaptation of Gumbel AlphaZero MCTS for LLMs that scales monotonically with the search budget.
▸ Sequential Halving is the primary driver of the improvement in ReSCALE.
▸ ReSCALE outperforms the baseline on GSM8K and Game24 at budgets where the baseline degrades.

Merits

Strength in Scaling

ReSCALE's ability to scale monotonically with the search budget without compromising accuracy is a significant merit.

Improvement in Accuracy

ReSCALE outperforms the baseline on two challenging tasks, GSM8K and Game24, demonstrating its efficacy in real-world applications.

Demerits

Limited Evaluation

The evaluation of ReSCALE is limited to two tasks, GSM8K and Game24, and it remains to be seen how well the approach generalizes to other domains.

Dependence on Sequential Halving

The success of ReSCALE relies heavily on the Sequential Halving strategy, and its effectiveness may be task-dependent.

Expert Commentary

The research presented in this article is a significant contribution to the field of LLMs, offering a novel and effective approach to scaling neural tree search. The authors demonstrate the efficacy of ReSCALE through thorough experimentation and ablation studies, providing a solid foundation for future research. However, the limited evaluation and dependence on Sequential Halving are notable limitations that need to be addressed. Nonetheless, the implications of ReSCALE are far-reaching, and its potential to revolutionize the field of LLMs is substantial. As the field continues to evolve, it is essential to consider the policy implications of deploying ReSCALE-based LLMs and to address the challenges associated with their development and deployment.

Recommendations

✓ Future research should aim to evaluate ReSCALE on a broader range of tasks and domains to demonstrate its generalizability.
✓ The authors should investigate the effectiveness of other strategies for replacing Dirichlet noise and PUCT selection to ensure that ReSCALE's success is not solely dependent on Sequential Halving.

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning

AI Commentary

Executive Summary

Key Points

Merits

Strength in Scaling

Improvement in Accuracy

Demerits

Limited Evaluation

Dependence on Sequential Halving

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.