ResearchGym: Evaluating Language Model Agents on Real-World AI Research
arXiv:2602.15112v1 Announce Type: new Abstract: We introduce ResearchGym, a benchmark and execution environment for evaluating AI agents on end-to-end research. To instantiate this, we repurpose …
Aniketh Garikaparthi, Manasi Patwardhan, Arman Cohan
3 views