Academic

Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

arXiv:2602.21496v1 Announce Type: new Abstract: While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wrong information. The capacity of LLMs to self-regulate these complex, context-dependent sensitive information leaks without destroying utility remains an open scientific question. To address this, we introduce SemSIEdit, an inference-time framework where an agentic "Editor" iteratively critiques and rewrites sensitive spans to preserve narrative flow rather than simply refusing to answer. Our analysis reveals a Privacy-Utility Pareto Frontier, where this agentic rewriting reduces leakage by 34.6% across all three SemSI categories while incurring a marginal utility loss of 9.8%. We also uncover a Scale-Dependent Safety Divergence: large reasoning models (e.g., GPT-5) achieve safety through con

Umid Suleymanov, Zaur Rajabov, Emil Mirzazada, Murat Kantarcioglu · March 2, 2026 · 1 min read · 0 views

#cs.AI

Executive Summary

This study proposes SemSIEdit, an agentic framework for Large Language Models (LLMs) to self-regulate and edit sensitive information while preserving narrative flow. The analysis reveals a significant reduction in sensitive information leakage (34.6%) with minimal utility loss (9.8%). However, the study also uncovers a Scale-Dependent Safety Divergence and a Reasoning Paradox, highlighting the complexities of addressing semantic sensitive information in LLMs. The findings have significant implications for the development of LLMs, particularly in applications where sensitive information is involved.

Key Points

▸ SemSIEdit framework for agentic self-regulation of LLMs
▸ Significant reduction in sensitive information leakage (34.6%)
▸ Scale-Dependent Safety Divergence: large models achieve safety through constructive expansion, while smaller models revert to destructive truncation
▸ Reasoning Paradox: inference-time reasoning increases risk, but also enables safe rewrites

Merits

Strength in Addressing Complex Sensitive Information

SemSIEdit addresses the limitations of existing defenses by introducing an agentic framework that can self-regulate and edit sensitive information while preserving narrative flow.

Significant Reduction in Sensitive Information Leakage

The study reveals a significant reduction in sensitive information leakage, demonstrating the effectiveness of the SemSIEdit framework.

Demerits

Limitation in Model-Scale Dependence

The study highlights a Scale-Dependent Safety Divergence, where large models achieve safety through constructive expansion, while smaller models revert to destructive truncation, introducing complexity in LLM development.

Reasoning Paradox and Increased Risk

The study uncovers a Reasoning Paradox, where inference-time reasoning increases risk, but also enables safe rewrites, highlighting the need for further research to balance risk and safety in LLM development.

Expert Commentary

The study makes significant contributions to the field of LLM safety and security by introducing the SemSIEdit framework and demonstrating its effectiveness in reducing sensitive information leakage. However, the study also highlights the complexities of addressing semantic sensitive information in LLMs, particularly in the context of model-scale dependence and inference-time reasoning. Further research is needed to balance risk and safety in LLM development and to address the implications of LLMs on human-model interaction and misinformation.

Recommendations

✓ Recommendation for further research: Investigate the impact of model-scale dependence on sensitive information leakage and model safety, and explore strategies to mitigate the Reasoning Paradox.
✓ Recommendation for policymakers: Develop regulatory frameworks that address the safety and security of LLMs, particularly in applications where sensitive information is involved.

Sources

arXiv - cs.AI

Something extraordinary is coming.

Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Complex Sensitive Information

Significant Reduction in Sensitive Information Leakage

Demerits

Limitation in Model-Scale Dependence

Reasoning Paradox and Increased Risk

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.