Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression
arXiv:2604.04988v1 Announce Type: new Abstract: Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as …
Longsheng Zhou, Yu Shen
5 views