A Residual-Aware Theory of Position Bias in Transformers
arXiv:2602.16837v1 Announce Type: new Abstract: Transformer models systematically favor certain token positions, yet the architectural origins of this position bias remain poorly understood. Under causal …
Hanna Herasimchyk, Robin Labryga, Tomislav Prusina, S\"oren Laue
6 views