This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, Chris Cundy

Articles by Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, Chris Cundy

Academic · 1 min

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

arXiv:2602.15515v1 Announce Type: new Abstract: Training against white-box deception detectors has been proposed as a way to make AI systems honest. However, such training risks …

4 views Feb 19

Something extraordinary is coming.

Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, Chris Cundy

Articles by Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, Chris Cundy

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

JCG, PC

HSOLLC Co., Ltd.