Skip to main content

All Articles

Articles

Academic · 1 min

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

arXiv:2602.23057v1 Announce Type: new Abstract: Transformer attention is typically implemented using softmax normalization, which enforces attention weights with unit sum normalization. While effective in many …

Jeongin Bae, Baeseong Park, Gunho Park, Minsub Kim, Joonhyung Lee, Junhee Yoo, Sunghyeon Woo, Jiwon Ryu, Se Jung Kwon, Dongsoo Lee
4 views