Linear Predictability of Attention Heads in Large Language Models
arXiv:2603.13314v1 Announce Type: new Abstract: Large language model (LLM) inference is increasingly bottlenecked by the Key-Value (KV) cache, yet the fine-grained structure of attention-head activations …
Khalid Shaikh, Asmit Kumar Singh, Rebecca Christopher Dsouza, Shikhar Shiromani
12 views