CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
arXiv:2602.16054v1 Announce Type: new Abstract: The prefill stage in long-context LLM inference remains a computational bottleneck. Recent token-ranking heuristics accelerate inference by selectively processing a …
Bradley McDanel, Steven Li, Harshit Khaitan
7 views