FAAR: Format-Aware Adaptive Rounding for NVFP4
arXiv:2603.22370v1 Announce Type: new Abstract: Deploying large language models (LLMs) on edge devices requires extremely low-bit quantization. Ultra-low precision formats such as NVFP4 offer a …
Hanglin Li, Shuchang Tian, Chen Lin, Zhiyong Zhao, Kun Zhan
4 views