Surgical Repair of Collapsed Attention Heads in ALiBi Transformers
arXiv:2603.09616v1 Announce Type: new Abstract: We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes …
Palmer Schallon
10 views