Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning
arXiv:2603.21162v1 Announce Type: new Abstract: Neural tree search is a powerful decision-making algorithm widely used in complex domains such as game playing and model-based reinforcement learning. Recent work has applied AlphaZero-style tree search to enhance the reasoning capabilities of Large Language Models (LLMs) during inference, but we find that this approach suffers from a scaling failure: on GSM8K and Game24, accuracy drops as the search budget increases. In this paper, we present ReSCALE, an adaptation of Gumbel AlphaZero MCTS that replaces Dirichlet noise and PUCT selection with Gumbel sampling and Sequential Halving, restoring monotonic scaling without changes to the model or its training. ReSCALE reaches 58.4\% on GSM8K and 85.3\% on Game24 at budgets where the baseline degrades. Ablations confirm that Sequential Halving is the primary driver of the improvement.
arXiv:2603.21162v1 Announce Type: new Abstract: Neural tree search is a powerful decision-making algorithm widely used in complex domains such as game playing and model-based reinforcement learning. Recent work has applied AlphaZero-style tree search to enhance the reasoning capabilities of Large Language Models (LLMs) during inference, but we find that this approach suffers from a scaling failure: on GSM8K and Game24, accuracy drops as the search budget increases. In this paper, we present ReSCALE, an adaptation of Gumbel AlphaZero MCTS that replaces Dirichlet noise and PUCT selection with Gumbel sampling and Sequential Halving, restoring monotonic scaling without changes to the model or its training. ReSCALE reaches 58.4\% on GSM8K and 85.3\% on Game24 at budgets where the baseline degrades. Ablations confirm that Sequential Halving is the primary driver of the improvement.
Executive Summary
This article presents ReSCALE, an adaptation of Gumbel AlphaZero MCTS for Large Language Models (LLMs) that scales monotonically with the search budget without compromising accuracy. By replacing Dirichlet noise and PUCT selection with Gumbel sampling and Sequential Halving, ReSCALE outperforms the baseline on two challenging tasks, GSM8K and Game24. The authors demonstrate the efficacy of their approach through ablation studies, attributing the improvement primarily to Sequential Halving. The implications of this research are significant, offering a more budget-scalable and accurate reasoning framework for LLMs. This breakthrough has the potential to revolutionize the field of LLMs, enabling them to tackle increasingly complex tasks with greater efficiency and accuracy.
Key Points
- ▸ ReSCALE is an adaptation of Gumbel AlphaZero MCTS for LLMs that scales monotonically with the search budget.
- ▸ Sequential Halving is the primary driver of the improvement in ReSCALE.
- ▸ ReSCALE outperforms the baseline on GSM8K and Game24 at budgets where the baseline degrades.
Merits
Strength in Scaling
ReSCALE's ability to scale monotonically with the search budget without compromising accuracy is a significant merit.
Improvement in Accuracy
ReSCALE outperforms the baseline on two challenging tasks, GSM8K and Game24, demonstrating its efficacy in real-world applications.
Demerits
Limited Evaluation
The evaluation of ReSCALE is limited to two tasks, GSM8K and Game24, and it remains to be seen how well the approach generalizes to other domains.
Dependence on Sequential Halving
The success of ReSCALE relies heavily on the Sequential Halving strategy, and its effectiveness may be task-dependent.
Expert Commentary
The research presented in this article is a significant contribution to the field of LLMs, offering a novel and effective approach to scaling neural tree search. The authors demonstrate the efficacy of ReSCALE through thorough experimentation and ablation studies, providing a solid foundation for future research. However, the limited evaluation and dependence on Sequential Halving are notable limitations that need to be addressed. Nonetheless, the implications of ReSCALE are far-reaching, and its potential to revolutionize the field of LLMs is substantial. As the field continues to evolve, it is essential to consider the policy implications of deploying ReSCALE-based LLMs and to address the challenges associated with their development and deployment.
Recommendations
- ✓ Future research should aim to evaluate ReSCALE on a broader range of tasks and domains to demonstrate its generalizability.
- ✓ The authors should investigate the effectiveness of other strategies for replacing Dirichlet noise and PUCT selection to ensure that ReSCALE's success is not solely dependent on Sequential Halving.
Sources
Original: arXiv - cs.AI