S

Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn

Articles by Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn

Academic · 1 min

Constitutional Black-Box Monitoring for Scheming in LLM Agents

arXiv:2603.00829v1 Announce Type: new Abstract: Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting …

Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn
4 views