Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression
arXiv:2603.02217v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE …
Sieun Hyeon, Jaeyoung Do
3 views