AXELRAM: Quantize Once, Never Dequantize
arXiv:2604.02638v1 Announce Type: new Abstract: We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization. …
Yasushi Nishida
26 views