Constraint-Rectified Training for Efficient Chain-of-Thought
arXiv:2602.12526v1 Announce Type: cross Abstract: Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), especially when combined with reinforcement learning (RL) …
Qinhang Wu, Sen Lin, Ming Zhang, Yingbin Liang, Ness B. Shroff
4 views