Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View
arXiv:2603.05573v1 Announce Type: new Abstract: Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables …
Gyuryang Heo, Timothy Ngotiaoco, Kazuki Irie, Samuel J. Gershman, Bernardo Sabatini
10 views