Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
arXiv:2603.11114v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for …
Mynampati Sri Ranganadha Avinash
9 views