A Rubric-Supervised Critic from Sparse Real-World Outcomes
arXiv:2603.03800v1 Announce Type: new Abstract: Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In …
Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig
9 views