Skip to main content
B

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak

Articles by Bruce W. Lee, Chen Yueh-Han, Tomek Korbak

Academic · 1 min

Training Agents to Self-Report Misbehavior

arXiv:2602.22303v1 Announce Type: new Abstract: Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior …

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak
6 views