ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection
arXiv:2602.21887v1 Announce Type: new Abstract: Current large reasoning models (LRMs) have shown strong ability on challenging tasks after reinforcement learning (RL) based post-training. However, previous …
Changjiang Gao, Zixian Huang, Kaichen Yang, Jiajun Chen, Jixing Li, Shujian Huang
4 views