Towards a Science of AI Agent Reliability
arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many …
Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan
6 views