SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
arXiv:2603.14303v1 Announce Type: new Abstract: Existing KV cache compression methods generally operate on discrete tokens or non-semantic chunks. However, such approaches often lead to semantic …
Shunlong Wu, Hai Lin, Shaoshen Chen, Tingwei Lu, Yongqin Zeng, Shaoxiong Zhan, Hai-Tao Zheng, Hong-Gee Kim
10 views