Academic

EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

arXiv:2603.18489v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by selectively updating cached states, but their decision overhead scales with context length or model depth. We propose EntropyCache, a training-free KV caching method that uses the maximum entropy of newly decoded token distributions as a constant-cost signal for deciding when to recompute. Our design is grounded in two empirical observations: (1) decoded token entropy correlates with KV cache drift, providing a cheap proxy for cache staleness, and (2) feature volatility of decoded tokens persists for multiple steps after unmasking, motivating recomputation of the $k$ most recently decoded tokens. The skip-or-recompute decision requires only $O(V)$ computation per step, independent of context length and model s

Minsoo Cheong, Donghyun Son, Woosang Lim, Sungjoo Yoo · March 20, 2026 · 1 min read · 9 views

#cs.CL

Executive Summary

This study presents EntropyCache, a novel key-value (KV) caching method designed for diffusion language models. By leveraging the maximum entropy of newly decoded token distributions as a constant-cost signal, EntropyCache achieves significant speedup (up to 26.4x) on standard benchmarks and competitive accuracy. The proposed method requires only O(V) computation per step, independent of context length and model scale. This work highlights the potential of entropy-based caching in accelerating inference for large language models, with potential applications in natural language processing and AI-related fields. The authors' empirical observations and design decisions demonstrate a deep understanding of the underlying mechanisms and challenges in KV caching for diffusion language models.

Key Points

▸ EntropyCache is a training-free KV caching method for diffusion language models
▸ The method uses decoded token entropy as a constant-cost signal for deciding when to recompute
▸ Experiments demonstrate significant speedup (up to 26.4x) on standard benchmarks and competitive accuracy

Merits

Strength in Empirical Evaluation

The authors provide comprehensive experiments on LLaDA-8B-Instruct and Dream-7B-Instruct, showcasing the efficacy of EntropyCache in real-world scenarios.

Innovative Use of Entropy Signal

The proposal to utilize decoded token entropy as a constant-cost signal for caching decisions is a novel and effective approach, demonstrating a deep understanding of the underlying mechanisms.

Demerits

Potential Overfitting

The method relies on empirical observations, which may not generalize to other datasets or models. Further investigation into the robustness and transferability of EntropyCache is warranted.

Scalability and Complexity

While the method demonstrates efficiency in computation, its scalability and complexity may be limited by the O(V) overhead, particularly for large models or contexts.

Expert Commentary

The study presents a well-designed and empirically validated approach to KV caching for diffusion language models. While potential limitations exist, the method demonstrates significant speedup and competitive accuracy, making it a valuable contribution to the field. The innovative use of entropy signal and comprehensive experiments showcase the authors' expertise and depth of understanding. As the field continues to evolve, the development of efficient inference methods like EntropyCache will remain crucial for unlocking the full potential of large language models.

Recommendations

✓ Further investigation into the robustness and transferability of EntropyCache across different datasets and models is warranted.
✓ The authors should explore potential applications of EntropyCache in other areas of natural language processing and AI-related fields.

Sources

arXiv - cs.CL

EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

AI Commentary

Executive Summary

Key Points

Merits

Strength in Empirical Evaluation

Innovative Use of Entropy Signal

Demerits

Potential Overfitting

Scalability and Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.