Does a Global Perspective Help Prune Sparse MoEs Elegantly?
arXiv:2604.06542v1 Announce Type: new Abstract: Empirical scaling laws for language models have encouraged the development of ever-larger LLMs, despite their growing computational and memory costs. …
Zeliang Zhang, Nikhil Ghosh, Jiani Liu, Bin Yu, Xiaodong Liu
6 views