Skip to main content
A

Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim

Articles by Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim

Academic · 1 min

Fast KV Compaction via Attention Matching

arXiv:2602.16284v1 Announce Type: new Abstract: Scaling language models to long contexts is often bottlenecked by the size of the key-value (KV) cache. In deployed settings, …

Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim
5 views