Academic

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

arXiv:2602.17003v1 Announce Type: new Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes

Serin Kim, Sangam Lee, Dongha Lee · February 22, 2026 · 1 min read · 3 views

#cs.CL #cs.AI

Executive Summary

The article 'Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History' introduces a novel benchmark for evaluating personalized web agents. The benchmark, Persona2Web, is designed to assess the ability of web agents to interpret ambiguous queries by inferring user preferences and contexts from user history, adhering to the 'clarify-to-personalize' principle. The study presents a comprehensive framework consisting of user histories, ambiguous queries, and a reasoning-aware evaluation framework. Extensive experiments across various agent architectures and models reveal key challenges in personalized web agent behavior, contributing to the advancement of personalized web agents and their practical applications.

Key Points

▸ Introduction of Persona2Web benchmark for personalized web agents.
▸ Benchmark built on the 'clarify-to-personalize' principle.
▸ Comprehensive framework including user histories, ambiguous queries, and evaluation metrics.
▸ Experiments conducted across various agent architectures and models.
▸ Key challenges identified in personalized web agent behavior.

Merits

Innovative Benchmark

Persona2Web is the first benchmark to evaluate personalized web agents on the real open web, addressing a critical gap in the field.

Comprehensive Framework

The benchmark includes a detailed framework with user histories, ambiguous queries, and a reasoning-aware evaluation framework, providing a robust tool for assessment.

Extensive Experimentation

The study conducts extensive experiments across various agent architectures, backbone models, and history access schemes, offering valuable insights into the challenges of personalized web agents.

Demerits

Limited Scope

The benchmark focuses primarily on the open web, which may not fully capture the complexity of all web environments and user interactions.

Reproducibility Concerns

While the authors provide codes and datasets for reproducibility, the complexity of the benchmark may still pose challenges for other researchers attempting to replicate the results.

Ambiguity in User Preferences

The inference of user preferences from ambiguous queries can be subjective and may not always accurately reflect real-world user intentions.

Expert Commentary

The introduction of the Persona2Web benchmark represents a significant step forward in the evaluation of personalized web agents. By focusing on the 'clarify-to-personalize' principle, the benchmark addresses a critical need in the field, providing a robust framework for assessing the ability of web agents to interpret ambiguous queries based on user history. The extensive experimentation conducted across various agent architectures and models offers valuable insights into the challenges and potential improvements in personalized web agent behavior. However, the benchmark's limited scope and potential reproducibility concerns highlight areas for further refinement. Additionally, the ethical and privacy implications of using user history to infer preferences cannot be overlooked. As the field of personalized web agents continues to evolve, it is crucial to balance technological advancements with ethical considerations and regulatory frameworks to ensure responsible and effective deployment.

Recommendations

✓ Further research should explore the broader applications of the Persona2Web benchmark, including its potential use in different web environments and user interactions.
✓ Developers and researchers should prioritize addressing privacy and security concerns, ensuring that user data is handled responsibly and ethically.

Sources

arXiv - cs.CL

Something extraordinary is coming.

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

AI Commentary

Executive Summary

Key Points

Merits

Innovative Benchmark

Comprehensive Framework

Extensive Experimentation

Demerits

Limited Scope

Reproducibility Concerns

Ambiguity in User Preferences

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.