Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric
arXiv:2602.14069v1 Announce Type: new Abstract: Scalar reward models compress multi-dimensional human preferences into a single opaque score, creating an information bottleneck that often leads to …
Ruipeng Jia, Yunyi Yang, Yuxin Wu, Yongbo Gai, Siyuan Tao, Mengyu Zhou, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang
10 views