ToolMATH: A Math Tool Benchmark for Realistic Long-Horizon Multi-Tool Reasoning
arXiv:2602.21265v1 Announce Type: new Abstract: We introduce \ToolMATH, a math-grounded benchmark that evaluates tool-augmented language models in realistic multi-tool environments where the output depends on …
Hyeonje Choi, Jeongsoo Lee, Hyojun Lee, Jay-Yoon Lee
6 views