Controllable Reasoning Models Are Private Thinkers
arXiv:2602.24210v1 Announce Type: new Abstract: AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields
arXiv:2602.24210v1 Announce Type: new Abstract: AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields substantial improvements, achieving gains of up to 20.9 points in instruction-following performance and up to 51.9 percentage points on privacy benchmarks. These improvements, however, can come at the cost of task utility, due to the trade-off between reasoning performance and instruction-following abilities. Overall, our results show that improving instruction-following behavior in reasoning models can significantly enhance privacy, suggesting a promising direction for the development of future privacy-aware agents. Our code and data are available at https://github.com/UKPLab/arxiv2026-controllable-reasoning-models
Executive Summary
The article 'Controllable Reasoning Models Are Private Thinkers' proposes a novel approach to enhance the privacy preservation abilities of AI agents powered by reasoning models. By training models to follow instructions not only in the final answer but also in their reasoning traces, the authors demonstrate substantial improvements in both instruction-following performance and privacy benchmarks. However, this improvement comes at the cost of task utility due to the trade-off between reasoning performance and instruction-following abilities. The findings suggest a promising direction for the development of future privacy-aware agents. The authors make their code and data available, facilitating further research in this area.
Key Points
- ▸ Training models to follow instructions in reasoning traces improves their privacy-preservation skills.
- ▸ Fine-tuning models on a new instruction-following dataset with explicit restrictions on reasoning traces demonstrates significant improvements in instruction-following performance and privacy benchmarks.
- ▸ The proposed approach can come at the cost of task utility due to the trade-off between reasoning performance and instruction-following abilities.
Merits
Strength
The study provides a novel and effective approach to enhance the privacy preservation abilities of AI agents, shedding light on a critical challenge in the development of future privacy-aware agents.
Demerits
Limitation
The proposed approach may require significant computational resources and may not be suitable for all applications due to the trade-off between reasoning performance and instruction-following abilities.
Expert Commentary
The article makes a significant contribution to the field of AI research by addressing a critical challenge in the development of future privacy-aware agents. The proposed approach demonstrates substantial improvements in both instruction-following performance and privacy benchmarks, but the trade-off between reasoning performance and instruction-following abilities is a limitation that needs to be further explored. The study's findings have practical and policy implications, highlighting the need for more research and development of trustworthy and accountable AI systems. It is essential to consider the potential applications and limitations of the proposed approach and to conduct further research to overcome the challenges associated with the trade-off between reasoning performance and instruction-following abilities.
Recommendations
- ✓ Recommendation 1: Further research is needed to explore the potential applications and limitations of the proposed approach and to develop more efficient and effective methods for enhancing the privacy preservation abilities of AI agents.
- ✓ Recommendation 2: The development of more robust and transparent AI systems requires a multidisciplinary approach, involving researchers from various fields, such as computer science, philosophy, and law, to ensure that AI systems are designed and developed in a way that respects human values and promotes accountability and transparency.