Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
arXiv:2604.04987v1 Announce Type: new Abstract: Speculative sampling (SpS) has been successful in accelerating the decoding throughput of auto-regressive large language models by leveraging smaller draft …