ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
arXiv:2604.03922v1 Announce Type: new Abstract: Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat …
Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li
11 views