Skip to main content
Academic

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

arXiv:2602.20743v1 Announce Type: new Abstract: Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across

arXiv:2602.20743v1 Announce Type: new Abstract: Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.

Executive Summary

The article proposes a novel approach to text anonymization, introducing adaptive text anonymization that learns privacy-utility trade-offs via prompt optimization. This method enables the automatic construction of anonymization instructions for language models, allowing for adaptation to diverse requirements and domains. The framework achieves a better privacy-utility trade-off than existing baselines, demonstrating computational efficiency and effectiveness on open-source language models.

Key Points

  • Adaptive text anonymization via prompt optimization
  • Automatic construction of anonymization instructions for language models
  • Improved privacy-utility trade-off compared to existing baselines

Merits

Flexibility and Adaptability

The proposed framework can adapt to different privacy goals, domains, and downstream usage patterns, making it a versatile solution for text anonymization.

Demerits

Dependence on Language Models

The effectiveness of the proposed framework relies on the quality and capabilities of the underlying language models, which may be a limitation in certain scenarios.

Expert Commentary

The article presents a significant contribution to the field of text anonymization, offering a flexible and adaptable approach that can be tailored to specific requirements. The use of prompt optimization to construct anonymization instructions for language models is a novel and effective technique. However, further research is needed to fully explore the potential and limitations of this approach, particularly in regards to its scalability and robustness in real-world applications.

Recommendations

  • Further evaluation of the proposed framework on a wider range of datasets and domains to assess its generalizability and effectiveness
  • Investigation into the potential applications and implications of adaptive text anonymization in various industries and contexts

Sources