Skip to main content
B

Bradley McDanel, Steven Li, Sruthikesh Surineni, Harshit Khaitan

Articles by Bradley McDanel, Steven Li, Sruthikesh Surineni, Harshit Khaitan

Academic · 1 min

MoE-Spec: Expert Budgeting for Efficient Speculative Decoding

arXiv:2602.16052v1 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, …

Bradley McDanel, Steven Li, Sruthikesh Surineni, Harshit Khaitan
4 views