Fast KV Compaction via Attention Matching
arXiv:2602.16284v1 Announce Type: new Abstract: Scaling language models to long contexts is often bottlenecked by the size of the key-value (KV) cache. In deployed settings, …
Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim
5 views