LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
arXiv:2602.23881v1 Announce Type: cross Abstract: Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that …
Alexander Samarin, Sergei Krutikov, Anton Shevtsov, Sergei Skvortsov, Filipp Fisin, Alexander Golubev
12 views