Surgical Activation Steering via Generative Causal Mediation
arXiv:2602.16080v1 Announce Type: new Abstract: Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a …
Aruna Sankaranarayanan, Amir Zur, Atticus Geiger, Dylan Hadfield-Menell
9 views