Curveball Steering: The Right Direction To Steer Isn't Always Linear
arXiv:2603.09313v1 Announce Type: new Abstract: Activation steering is a widely used approach for controlling large language model (LLM) behavior by intervening on internal representations. Existing …
Shivam Raval, Hae Jin Song, Linlin Wu, Abir Harrasse, Jeff Phillips, Amirali Abdullah
4 views