KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem
arXiv:2602.20217v1 Announce Type: new Abstract: Self-speculative decoding (SSD) accelerates LLM inference by skipping layers to create an efficient draft model, yet existing methods often rely …
Seongjin Cha, Gyuwan Kim, Dongsu Han, Tao Yang, Insu Han
4 views