ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs
arXiv:2603.18614v1 Announce Type: new Abstract: Tool-augmented large language models (LLMs) must tightly couple multi-step reasoning with external actions, yet existing benchmarks often confound this interplay …
Wanjia Zhao, Ludwig Schmidt, James Zou, Vidhisha Balachandran, Lingjiao Chen
58 views