Academic

Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

arXiv:2602.23079v1 Announce Type: new Abstract: The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM agent designed to evaluate and mitigate such risks through a structured, interpretable pipeline. Central to our framework is the proposed $\textit{SALA}$ (Stylometry-Assisted LLM Analysis) method, which integrates quantitative stylometric features with LLM reasoning for robust and transparent authorship attribution. Experiments on large-scale news datasets demonstrate that $\textit{SALA}$, particularly when augmented with a database module, achieves high inference accuracy in various scenarios. Finally, we propose a guided recomposition strategy that leverages the agent's reasoning trace to generate rewriting prompts, effectively reducing authorship identifiability while preserving textual meani

Boyang Zhang, Yang Zhang · February 28, 2026 · 1 min read · 5 views

#cs.CL #cs.CR #cs.LG

Executive Summary

This study proposes a stylometry-assisted large language model (LLM) framework, SALA, to evaluate and mitigate unintended deanonymization risks in textual data. SALA integrates quantitative stylometric features with LLM reasoning for robust authorship attribution. Experiments on large-scale news datasets demonstrate high inference accuracy. A guided recomposition strategy is also proposed to reduce authorship identifiability while preserving textual meaning. The study highlights the deanonymization potential of LLM agents and the importance of interpretable, proactive defenses for safeguarding author privacy. The findings of this study have significant implications for the development of LLM agents and the protection of author privacy in various applications.

Key Points

▸ SALA framework integrates stylometry and LLM reasoning for robust authorship attribution.
▸ Guided recomposition strategy reduces authorship identifiability while preserving textual meaning.
▸ Experiments on large-scale news datasets demonstrate high inference accuracy.

Merits

Strength in Methodology

The study proposes a structured and interpretable pipeline for evaluating deanonymization risks, which is a significant strength of the methodology.

Robust Authorship Attribution

The SALA framework demonstrates high inference accuracy in various scenarios, making it a robust tool for authorship attribution.

Demerits

Limited Generalizability

The study is limited to experiments on large-scale news datasets, which may not be representative of other types of textual data.

Dependence on Databases

The SALA framework relies on a database module, which may not be feasible or scalable in all applications.

Expert Commentary

The study proposes a novel framework for evaluating and mitigating deanonymization risks in textual data. The SALA framework integrates stylometry and LLM reasoning, which is a significant improvement over existing methods. The guided recomposition strategy proposed in the study is also an interesting innovation that can be used to reduce authorship identifiability while preserving textual meaning. However, the study is limited to experiments on large-scale news datasets, which may not be representative of other types of textual data. Additionally, the SALA framework relies on a database module, which may not be feasible or scalable in all applications. Overall, the study is a significant contribution to the field of authorship attribution and textual data protection, and it highlights the importance of developing more robust and transparent methods for protecting author privacy.

Recommendations

✓ Future studies should explore the application of the SALA framework to other types of textual data, such as social media posts and online comments.
✓ The development of more robust and transparent authorship attribution tools, such as the SALA framework, should be prioritized in various applications, including journalism and academic publishing.

Sources

arXiv - cs.CL

Something extraordinary is coming.

Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

AI Commentary

Executive Summary

Key Points

Merits

Strength in Methodology

Robust Authorship Attribution

Demerits

Limited Generalizability

Dependence on Databases

Expert Commentary

Recommendations

Sources

Related Articles

Budget-Aware Agentic Routing via Boundary-Guided Training

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision …

Urban Vibrancy Embedding and Application on Traffic Prediction

JCG, PC

HSOLLC Co., Ltd.