This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Suraj Yadav, Siddharth Yadav, Parth Goyal

Articles by Suraj Yadav, Siddharth Yadav, Parth Goyal

Academic · 1 min

Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs

arXiv:2604.06298v1 Announce Type: new Abstract: Recent alignment work on Large Language Models (LLMs) suggests preference optimization can improve reasoning by shifting probability mass toward better …

60 views Apr 9

Suraj Yadav, Siddharth Yadav, Parth Goyal

Articles by Suraj Yadav, Siddharth Yadav, Parth Goyal

Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs

JCG, PC

HSOLLC Co., Ltd.