LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges
arXiv:2603.02586v1 Announce Type: new Abstract: As large language models grow more capable, general AI agents have become increasingly prevalent in practical applications. However, existing benchmarks …
Hao Li, Huan Wang, Jinjie Gu, Wenjie Wang, Chenyi Zhuang, Sikang Bian
3 views