Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds
arXiv:2602.17798v1 Announce Type: new Abstract: Mixture-of-Experts models rely on learned routers to assign tokens to experts, yet standard softmax gating provides no principled mechanism to …