Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning
arXiv:2602.16984v1 Announce Type: new Abstract: Black-box safety evaluation of AI systems assumes model behavior on test distributions reliably predicts deployment performance. We formalize and challenge …
Vishal Srivastava
5 views