Academic

Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma · February 24, 2026 · 1 min read · 5 views

#cs.LG

arXiv:2602.17798v1 Announce Type: new Abstract: Mixture-of-Experts models rely on learned routers to assign tokens to experts, yet standard softmax gating provides no principled mechanism to control the tradeoff between sparsity and utilization. We propose Grassmannian MoE (GrMoE), a routing framework that operates on the Grassmannian manifold of subspaces, where gating weights arise from the concentration parameters of Matrix Bingham distributions. This construction yields a single, interpretable knob -- the concentration matrix $\Lambda$ -- that continuously controls routing entropy, replacing discrete top-$k$ selection with a smooth, geometrically principled sparsity mechanism. We further develop an amortized variational inference procedure for posterior routing distributions, enabling uncertainty-aware expert assignment that naturally resists expert collapse. We formally prove tight bounds relating the Bingham concentration spectrum to routing entropy, expected top-$k$ mass, and an exponential bound on expert collapse, establishing the first formal theory of concentration-controlled sparsity. On synthetic routing tasks, a 350M-parameter MoE language model with 8 experts, a 1.3B-parameter model with 16 experts, and a 2.7B-parameter model with 32 experts, GrMoE achieves 0\% routing collapse across all seeds, comparable or better perplexity with 15--30\% improved load balance, and a smooth monotonic relationship between concentration and effective sparsity that enables post-hoc sparsity tuning without retraining. Token-level analysis reveals that experts learn heterogeneous concentration values that correlate with linguistic specialization, providing interpretable routing behavior.

Executive Summary

The article presents a novel routing framework for Mixture-of-Experts (MoE) models, called Grassmannian MoE (GrMoE), which operates on the Grassmannian manifold of subspaces. GrMoE introduces a concentration-controlled sparsity mechanism, where the concentration matrix λ controls routing entropy, replacing discrete top-k selection with a smooth, geometrically principled sparsity mechanism. The method enables uncertainty-aware expert assignment and resists expert collapse. The authors conduct experiments on synthetic routing tasks, demonstrating improved load balance and a smooth monotonic relationship between concentration and effective sparsity. The results reveal interpretable routing behavior, with experts learning heterogeneous concentration values that correlate with linguistic specialization. The article provides a formal theory of concentration-controlled sparsity, establishing tight bounds relating the Bingham concentration spectrum to routing entropy and expert collapse.

Key Points

▸ GrMoE operates on the Grassmannian manifold of subspaces, introducing a concentration-controlled sparsity mechanism.
▸ The concentration matrix λ controls routing entropy, replacing discrete top-k selection with a smooth, geometrically principled sparsity mechanism.
▸ GrMoE enables uncertainty-aware expert assignment and resists expert collapse.

Merits

Strength in theoretical foundations

The article provides a formal theory of concentration-controlled sparsity, establishing tight bounds relating the Bingham concentration spectrum to routing entropy and expert collapse, which is a significant contribution to the field.

Demerits

Limitation in applicability

The method may not be generalizable to other domains or tasks beyond synthetic routing tasks, requiring further research to establish its broader applicability.

Expert Commentary

The article presents a significant contribution to the field of MoE models, introducing a novel routing framework and concentration-controlled sparsity mechanism. The method's ability to resist expert collapse and enable uncertainty-aware expert assignment is particularly noteworthy. However, the article's limitations in applicability and scope should be carefully considered. Further research is needed to establish the broader applicability of the method and to explore its potential applications in other domains. Overall, the article provides a valuable addition to the literature on MoE models and their applications.

Recommendations

✓ Further research is needed to establish the broader applicability of the method and to explore its potential applications in other domains.
✓ The authors should investigate the method's performance on more complex tasks and datasets, and explore its potential applications in areas such as natural language processing and computer vision.

Sources

arXiv - cs.LG

Something extraordinary is coming.

Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds

AI Commentary

Executive Summary

Key Points

Merits

Strength in theoretical foundations

Demerits

Limitation in applicability

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.