MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
arXiv:2602.16052v1 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, …
Bradley McDanel, Steven Li, Sruthikesh Surineni, Harshit Khaitan
4 views