RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and …
arXiv:2603.20799v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards (RLVR) stimulates the thinking processes of large language models (LLMs), substantially enhancing their reasoning abilities …
Kaiyuan Li, Jing-Cheng Pang, Yang Yu
4 views