CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference
arXiv:2602.20732v1 Announce Type: new Abstract: Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior …
Chao Fei, Guozhong Li, Chenxi Liu, Panos Kalnis
20 views