Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple
arXiv:2603.11053v1 Announce Type: new Abstract: Speculative decoding is a technique that uses multiple language models to accelerate infer- ence. Previous works have used an experi- …
Amirhossein Bozorgkhoo, Igor Molybog
81 views