Make Every Draft Count: Hidden State based Speculative Decoding
arXiv:2602.21224v1 Announce Type: cross Abstract: Speculative decoding has emerged as a pivotal technique to accelerate LLM inference by employing a lightweight draft model to generate …
Yuetao Chen, Xuliang Wang, Xinzhou Zheng, Ming Li, Peng Wang, Hong Xu
1 views