Academic

Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

Rahul Thomas, Teo Kitanovski, Micah Goldblum, Arka Pal · February 21, 2026 · 1 min read · 4 views

#cs.LG

arXiv:2602.16994v1 Announce Type: new Abstract: Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that accepts a subset of these. While prior work has proposed various verification algorithms for i.i.d rollouts, their relative performance under matched settings remains unclear. In this work, we firstly present a systematic evaluation of verification strategies across model families, tasks, and sampling regimes, and find that Traversal Verification dominates consistently, with OT-based methods lagging far behind. Our analysis uncovers that this occurs because OT-based methods achieve high multi-token acceptance near the root of the draft tree, while multi-token gains are most impactful deeper in the draft tree, where draft and target distributions diverge. Based on this insight, we propose delayed tree expansion, which drafts a partial single path, delaying the i.i.d. branching point. We show that delayed tree expansion preserves the target distribution and improves on root-node i.i.d rollouts. Further, we develop a dynamic neural selector that estimates the expected block efficiency of optimal-transport-based verification methods from draft and target features, enabling context-dependent expansion decisions. Our neural selector allows OT-based methods like SpecInfer to outperform Traversal Verification for the first time, achieving 5% higher average throughput across a wide range of models, datasets, and sampling settings.

Executive Summary

This article presents a novel approach to multi-path speculative decoding, a technique used for accelerating lossless sampling from a target model. The authors propose a systematic evaluation of verification strategies, which reveals that Traversal Verification outperforms OT-based methods. Building on this insight, they introduce delayed tree expansion, a technique that drafts a partial single path, delaying the i.i.d. branching point. This innovation preserves the target distribution and improves upon root-node i.i.d rollouts. Furthermore, the authors develop a dynamic neural selector that estimates the expected block efficiency of optimal-transport-based verification methods, enabling context-dependent expansion decisions. The proposed approach achieves 5% higher average throughput across a wide range of models, datasets, and sampling settings.

Key Points

▸ Dynamic Delayed Tree Expansion is introduced to improve multi-path speculative decoding
▸ Traversal Verification outperforms OT-based methods in verification strategies
▸ Delayed tree expansion preserves the target distribution and improves upon root-node i.i.d rollouts

Merits

Strength in systematic evaluation

The authors conduct a comprehensive evaluation of verification strategies, providing valuable insights into the relative performance of different approaches.

Demerits

Limitation in OT-based method performance

OT-based methods lag behind Traversal Verification, requiring further optimization to match its performance.

Expert Commentary

The article presents a significant contribution to the field of efficient sampling from complex models. The authors' systematic evaluation of verification strategies provides valuable insights into the relative performance of different approaches. The introduction of delayed tree expansion and the dynamic neural selector are innovative techniques that demonstrate the potential for improved performance. However, the limitation of OT-based methods performance highlights the need for further optimization. The implications of this research are far-reaching, with potential applications in various machine learning domains. From a practical perspective, the proposed approach can be applied to accelerate lossless sampling in various applications. From a policy perspective, the findings may inform decisions regarding the use of multi-path speculative decoding in industry and academia. Overall, this article is a significant contribution to the field, and its results have the potential to impact various machine learning applications.

Recommendations

✓ Future research should focus on optimizing OT-based methods to match the performance of Traversal Verification.
✓ The authors' approach should be applied to a wider range of machine learning applications to demonstrate its generalizability.

Sources

arXiv - cs.LG

Something extraordinary is coming.

Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

AI Commentary

Executive Summary

Key Points

Merits

Strength in systematic evaluation

Demerits

Limitation in OT-based method performance

Expert Commentary

Recommendations

Sources

Related Articles

How Large Language Models Get Stuck: Early structure with persistent …

Distribution-Aware Companding Quantization of Large Language Models

Policy Compliance of User Requests in Natural Language for AI …

LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

JCG, PC

HSOLLC Co., Ltd.