ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs
arXiv:2602.17698v1 Announce Type: cross Abstract: Post-training weight quantization is crucial for reducing the memory and inference cost of large language models (LLMs), yet pushing the …
Xinlin Li, Timothy Chou, Josh Fromm, Zichang Liu, Yunjie Pan, Christina Fragouli
3 views