Academic

Synthetic Interaction Data for Scalable Personalization in Large Language Models

arXiv:2602.12394v1 Announce Type: new Abstract: Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users. This gap is primarily due to (i) the absence of high-quality, privacy-sensitive data that capture personalized user-LLM interactions at scale, and (ii) the lack of robust reward signals for individual preferences. To overcome existing data limitations, we introduce a high-fidelity synthetic data generation framework called PersonaGym. Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process via an agentic LLM system to simulate realistic preference behaviors and semantic-aware noise in order to generate personalized multi-turn interaction trajectories. Using PersonaGym, we release

arXiv:2602.12394v1 Announce Type: new Abstract: Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users. This gap is primarily due to (i) the absence of high-quality, privacy-sensitive data that capture personalized user-LLM interactions at scale, and (ii) the lack of robust reward signals for individual preferences. To overcome existing data limitations, we introduce a high-fidelity synthetic data generation framework called PersonaGym. Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process via an agentic LLM system to simulate realistic preference behaviors and semantic-aware noise in order to generate personalized multi-turn interaction trajectories. Using PersonaGym, we release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories that closely mirror real-world preference expression and noise patterns. We further propose Personalized Prompt Optimization (PPOpt), a scalable and model-agnostic framework that optimizes user prompts based on interaction histories without modifying the deployed LLM. PPOpt adopts a reason-then-optimize paradigm that infers an explicit user profile and conditions prompt rewriting on the user profile to avoid reward hacking. Our training procedure for PPOpt integrates a cold-start supervised prior with outcome-driven multi-objective reinforcement learning. We present extensive experiments to demonstrate consistent improvements over state-of-the-art baselines in terms of task performance, personalization quality, and robustness to noisy as well as to sparse preference signals.

Executive Summary

The article titled 'Synthetic Interaction Data for Scalable Personalization in Large Language Models' addresses the challenge of personalizing large language models (LLMs) to individual user preferences. The authors introduce PersonaGym, a framework for generating high-fidelity synthetic data that simulates dynamic user preferences and interaction trajectories. They also propose Personalized Prompt Optimization (PPOpt), a model-agnostic framework that optimizes user prompts based on interaction histories without altering the deployed LLM. The study demonstrates significant improvements over existing methods in task performance, personalization quality, and robustness to noisy and sparse preference signals.

Key Points

  • Introduction of PersonaGym for generating synthetic interaction data.
  • Proposal of PPOpt for optimizing user prompts based on interaction histories.
  • Demonstration of improvements in task performance and personalization quality.

Merits

Innovative Framework

The introduction of PersonaGym and PPOpt represents a significant advancement in the field of personalized LLMs, addressing critical gaps in data quality and reward signals.

Scalability

The proposed methods are scalable and model-agnostic, making them applicable to a wide range of LLMs and user scenarios.

Robustness

The study demonstrates robustness to noisy and sparse preference signals, which is crucial for real-world applications.

Demerits

Data Quality

While synthetic data generation is a strength, the quality and fidelity of the synthetic data compared to real-world data need further validation.

Privacy Concerns

The use of synthetic data to mimic user preferences raises privacy concerns that need to be addressed to ensure ethical deployment.

Generalizability

The effectiveness of PPOpt across different types of LLMs and diverse user preferences needs to be thoroughly tested.

Expert Commentary

The article presents a compelling advancement in the field of personalized large language models. The introduction of PersonaGym and PPOpt addresses critical gaps in data quality and reward signals, offering a scalable and model-agnostic solution. The study's demonstration of improvements in task performance and personalization quality is particularly noteworthy. However, the quality and fidelity of synthetic data compared to real-world data need further validation. Additionally, the use of synthetic data to mimic user preferences raises privacy concerns that must be addressed to ensure ethical deployment. The study's findings have significant practical implications for enhancing personalization in LLM systems and can inform policy guidelines for the ethical deployment of personalized AI systems.

Recommendations

  • Further validation of the quality and fidelity of synthetic data generated by PersonaGym.
  • Addressing privacy concerns through robust privacy-preserving techniques in the generation and use of synthetic data.
  • Conducting extensive testing of PPOpt across different types of LLMs and diverse user preferences to ensure generalizability.

Sources