A Theoretical Analysis of Mamba's Training Dynamics: Filtering Relevant Features for Generalization in State Space …
arXiv:2602.12499v1 Announce Type: new Abstract: The recent empirical success of Mamba and other selective state space models (SSMs) has renewed interest in non-attention architectures for …
Mugunthan Shandirasegaran, Hongkang Li, Songyang Zhang, Meng Wang, Shuai Zhang
19 views