Constitutional Black-Box Monitoring for Scheming in LLM Agents
arXiv:2603.00829v1 Announce Type: new Abstract: Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting …
Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn
4 views