Academic

Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs

arXiv:2603.05618v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent. Increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight inference-time gatekeepers: a rule-based detector, a TF-IDF + lo

arXiv:2603.05618v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent. Increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight inference-time gatekeepers: a rule-based detector, a TF-IDF + logistic regression classifier, a GLiNER-based NER model, and an LLM-as-judge, using risk-weighted F1, Macro-F1, and recall. No single method dominates across models or budgets, motivating hybrid, style-adaptive gatekeeping policies that balance utility and risk under a common, reproducible protocol.

Executive Summary

This article addresses a critical concern in the development and deployment of Large Language Models (LLMs), specifically the risk of Chain-of-Thought (CoT) prompting leaking personally identifiable information (PII). The study employs a model-agnostic framework to measure and mitigate this risk, highlighting the dependence of leakage on the allowed CoT budget and the base model. The findings emphasize the need for lightweight inference-time gatekeepers to prevent PII leakage. While the study provides valuable insights, it also underscores the complexity of addressing this issue and the lack of a single, universally effective solution. The research has significant implications for both practical and policy considerations in the development and deployment of LLMs.

Key Points

  • CoT prompting increases the risk of PII leakage in LLMs.
  • Leakage is dependent on the allowed CoT budget and the base model.
  • Lightweight inference-time gatekeepers are necessary to prevent PII leakage.

Merits

Strength

The study employs a model-agnostic framework to measure and mitigate PII leakage, providing a comprehensive understanding of the issue.

Strength

The research identifies the dependence of leakage on the allowed CoT budget and the base model, highlighting the need for nuanced approaches to addressing the issue.

Demerits

Limitation

The study's reliance on a small, structured PII dataset may limit the generalizability of its findings to more diverse and complex datasets.

Limitation

The lack of a single, universally effective solution for preventing PII leakage may necessitate the development of more complex and context-dependent gatekeeping policies.

Expert Commentary

The article addresses a critical concern in the development and deployment of LLMs, specifically the risk of CoT prompting leaking PII. The study's findings emphasize the need for lightweight inference-time gatekeepers to prevent PII leakage, which has significant implications for both practical and policy considerations. While the research provides valuable insights, it also underscores the complexity of addressing this issue and the lack of a single, universally effective solution. The study's reliance on a small, structured PII dataset may limit the generalizability of its findings, and the lack of a single, effective solution may necessitate the development of more complex and context-dependent gatekeeping policies.

Recommendations

  • Researchers and developers should prioritize the development of lightweight inference-time gatekeepers to prevent PII leakage in LLMs.
  • Regulators and policymakers should consider the study's findings when developing regulations and guidelines for the deployment of LLMs in industries subject to strict data protection regulations.

Sources