Tag: cs.AR

#cs.AR

Academic · 1 min

AXELRAM: Quantize Once, Never Dequantize

arXiv:2604.02638v1 Announce Type: new Abstract: We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization. …

Yasushi Nishida
26 views
Academic · 1 min

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

arXiv:2603.12269v1 Announce Type: cross Abstract: Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI …

Parth Patne, Mahdi Taheri, Christian Herglotz, Maksim Jenihhin, Milos Krstic, Michael H\"ubner
31 views