IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
arXiv:2603.04738v1 Announce Type: new Abstract: Instruction-following is a foundational capability of large language models (LLMs), with its improvement hinging on scalable and accurate feedback from …
Bosi Wen, Yilin Niu, Cunxiang Wang, Xiaoying Ling, Ying Zhang, Pei Ke, Hongning Wang, Minlie Huang
3 views