This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Atticus Wang, Iv\'an Arcuschin, Arthur Conmy

Articles by Atticus Wang, Iv\'an Arcuschin, Arthur Conmy

Academic · 1 min

Automatically Finding Reward Model Biases

arXiv:2602.15222v1 Announce Type: new Abstract: Reward models are central to large language model (LLM) post-training. However, past work has shown that they can reward spurious …

7 views Feb 19

Something extraordinary is coming.

Atticus Wang, Iv\'an Arcuschin, Arthur Conmy

Articles by Atticus Wang, Iv\'an Arcuschin, Arthur Conmy

Automatically Finding Reward Model Biases

JCG, PC

HSOLLC Co., Ltd.