Singular Vectors of Attention Heads Align with Features
arXiv:2602.13524v1 Announce Type: new Abstract: Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made an implicit …
Gabriel Franco, Carson Loughridge, Mark Crovella
4 views