Towards Better RL Training Data Utilization via Second-Order Rollout
arXiv:2602.22765v1 Announce Type: new Abstract: Reinforcement Learning (RL) has empowered Large Language Models (LLMs) with strong reasoning capabilities, but vanilla RL mainly focuses on generation …
Zhe Yang, Yudong Wang, Rang Li, Zhifang Sui
8 views