This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Keston Aquino-Michaels

Articles by Keston Aquino-Michaels

Academic · 1 min

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

arXiv:2603.02227v1 Announce Type: cross Abstract: Can a transformer learn which attention entries matter during training? In principle, yes: attention distributions are highly concentrated, and a …

20 views Mar 5

Keston Aquino-Michaels

Articles by Keston Aquino-Michaels

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

JCG, PC

HSOLLC Co., Ltd.