Training Large Reasoning Models Efficiently via Progressive Thought Encoding
arXiv:2602.16839v1 Announce Type: new Abstract: Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, on six widely used challenging mathematical benchmarks show consistent
arXiv:2602.16839v1 Announce Type: new Abstract: Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, on six widely used challenging mathematical benchmarks show consistent gains: our method achieves +19.3% improvement over LoRA-based fine-tuning and +29.9% over LRMs without fine-tuning on average, with up to +23.4 accuracy improvement on AIME2024/2025 under the same tight cache budgets. These results demonstrate that Progressive Thought Encoding not only improves reasoning accuracy but also makes RL training of LRMs substantially more efficient and scalable under real-world memory constraints.
Executive Summary
This article presents Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables large reasoning models (LRMs) to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, the approach reduces memory usage during RL training while maintaining constant memory during inference. Experiments on three models and six mathematical benchmarks demonstrate significant gains in accuracy and efficiency over existing methods. The results suggest that Progressive Thought Encoding improves reasoning accuracy and makes RL training of LRMs more efficient and scalable under real-world memory constraints. This advancement has the potential to accelerate the development and application of LRMs in various fields, including mathematics, science, and technology.
Key Points
- ▸ Progressive Thought Encoding is a parameter-efficient fine-tuning method for large reasoning models (LRMs).
- ▸ The approach eliminates the need to backpropagate through full-cache rollouts, reducing memory usage during RL training.
- ▸ Experiments demonstrate significant gains in accuracy and efficiency over existing methods.
Merits
Improved Reasoning Accuracy
Progressive Thought Encoding achieves consistent gains in accuracy, with up to +23.4% improvement on AIME2024/2025, demonstrating its effectiveness in improving reasoning performance.
Efficient RL Training
The approach reduces memory usage during RL training while maintaining constant memory during inference, making it a substantial improvement over existing methods.
Demerits
Limited Generalizability
The experiments were conducted on a specific set of models and benchmarks, which may limit the generalizability of the results to other domains and applications.
Potential Overreliance on Cache Strategies
The approach relies on fixed-size caches, which may not be sufficient for more complex or dynamic tasks, potentially limiting its scalability and applicability.
Expert Commentary
The article presents a novel approach to improving the efficiency and scalability of large reasoning models, which is a pressing concern in the development and deployment of these models. While the results are promising, it is essential to consider the limitations and potential pitfalls of the approach, including its limited generalizability and potential reliance on cache strategies. Nevertheless, the findings of this study have significant implications for the ongoing research effort in this area and highlight the need for further investigation into the development of more efficient and scalable methods for training and deploying LRMs.
Recommendations
- ✓ Future research should focus on exploring the generalizability of Progressive Thought Encoding to other domains and applications.
- ✓ The development of more sophisticated cache strategies and memory-efficient architectures is essential to further improve the efficiency and scalability of LRMs.