Academic

Selective Forgetting for Large Reasoning Models

arXiv:2604.03571v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) generate structured chains of thought (CoTs) before producing final answers, making them especially vulnerable to knowledge leakage through intermediate reasoning steps. Yet, the memorization of sensitive information in the training data such as copyrighted and private content has led to ethical and legal concerns. To address these issues, selective forgetting (also known as machine unlearning) has emerged as a potential remedy for LRMs. However, existing unlearning methods primarily target final answers and may degrade the overall reasoning ability of LRMs after forgetting. Additionally, directly applying unlearning on the entire CoTs could degrade the general reasoning capabilities. The key challenge for LRM unlearning lies in achieving precise unlearning of targeted knowledge while preserving the integrity of general reasoning capabilities. To bridge this gap, we in this paper propose a novel LRM unlearni

T
Tuan Le, Wei Qian, Mengdi Huai
· · 1 min read · 5 views

arXiv:2604.03571v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) generate structured chains of thought (CoTs) before producing final answers, making them especially vulnerable to knowledge leakage through intermediate reasoning steps. Yet, the memorization of sensitive information in the training data such as copyrighted and private content has led to ethical and legal concerns. To address these issues, selective forgetting (also known as machine unlearning) has emerged as a potential remedy for LRMs. However, existing unlearning methods primarily target final answers and may degrade the overall reasoning ability of LRMs after forgetting. Additionally, directly applying unlearning on the entire CoTs could degrade the general reasoning capabilities. The key challenge for LRM unlearning lies in achieving precise unlearning of targeted knowledge while preserving the integrity of general reasoning capabilities. To bridge this gap, we in this paper propose a novel LRM unlearning framework that selectively removes sensitive reasoning components while preserving general reasoning capabilities. Our approach leverages multiple LLMs with retrieval-augmented generation (RAG) to analyze CoT traces, identify forget-relevant segments, and replace them with benign placeholders that maintain logical structure. We also introduce a new feature replacement unlearning loss for LRMs, which can simultaneously suppress the probability of generating forgotten content while reinforcing structurally valid replacements. Extensive experiments on both synthetic and medical datasets verify the desired properties of our proposed method.

Executive Summary

This article proposes a novel framework for large reasoning models (LRMs) to selectively forget sensitive information while preserving general reasoning capabilities. The framework leverages multiple LLMs with retrieval-augmented generation to analyze chain of thought (CoT) traces, identify forget-relevant segments, and replace them with benign placeholders. A new feature replacement unlearning loss is introduced to suppress the probability of generating forgotten content while reinforcing structurally valid replacements. Extensive experiments on synthetic and medical datasets verify the effectiveness of the proposed method. This work addresses the critical challenge of achieving precise unlearning of targeted knowledge in LRMs, which is essential for mitigating knowledge leakage and ethical concerns.

Key Points

  • Selective forgetting is proposed as a remedy for LRMs to address knowledge leakage and ethical concerns.
  • A novel LRM unlearning framework is introduced to selectively remove sensitive reasoning components while preserving general reasoning capabilities.
  • The framework leverages multiple LLMs with retrieval-augmented generation to analyze CoT traces and replace sensitive information with benign placeholders.

Merits

Strength in Addressing Knowledge Leakage

The proposed framework effectively addresses the critical challenge of achieving precise unlearning of targeted knowledge in LRMs, which is essential for mitigating knowledge leakage and ethical concerns.

Demerits

Limited Generalizability

The framework's effectiveness and generalizability to diverse LRMs and datasets are not extensively evaluated, which may limit its applicability in real-world scenarios.

Expert Commentary

The proposed framework is a significant contribution to the field of machine unlearning, addressing the critical challenge of achieving precise unlearning of targeted knowledge in LRMs. While the framework is effective in selectively forgetting sensitive information, its limited generalizability and evaluation on diverse LRMs and datasets are notable concerns. Furthermore, the framework's implications for data protection regulations and policies are far-reaching and warrant careful consideration. Overall, the proposed framework is a crucial step towards ensuring the responsible development and deployment of LRMs, and its potential impact on the field of AI and data protection is substantial.

Recommendations

  • Future research should focus on evaluating the framework's generalizability and effectiveness on diverse LRMs and datasets to ensure its applicability in real-world scenarios.
  • Policymakers and regulatory bodies should consider revising data protection regulations and policies to accommodate the challenges and opportunities presented by LRMs.

Sources

Original: arXiv - cs.AI