Self-Attribution Bias: When AI Monitors Go Easy on Themselves
arXiv:2603.04582v1 Announce Type: new Abstract: Agentic systems increasingly rely on language models to monitor their own behavior. For example, coding agents may self critique generated …
Dipika Khullar, Jack Hopkins, Rowan Wang, Fabien Roger
3 views