Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition
arXiv:2604.05279v1 Announce Type: new Abstract: Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless …