EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models
arXiv:2603.18489v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass …
Minsoo Cheong, Donghyun Son, Woosang Lim, Sungjoo Yoo
10 views