SPILLage: Agentic Oversharing on the Web
arXiv:2602.13516v1 Announce Type: new Abstract: LLM-powered agents are beginning to automate user's tasks across the open web, often with access to user resources such as emails and calendars. Unlike standard LLMs answering questions in a controlled ChatBot setting, web agents act "in the wild", interacting with third parties and leaving behind an action trace. Therefore, we ask the question: how do web agents handle user resources when accomplishing tasks on their behalf across live websites? In this paper, we formalize Natural Agentic Oversharing -- the unintentional disclosure of task-irrelevant user information through an agent trace of actions on the web. We introduce SPILLage, a framework that characterizes oversharing along two dimensions: channel (content vs. behavior) and directness (explicit vs. implicit). This taxonomy reveals a critical blind spot: while prior work focuses on text leakage, web agents also overshare behaviorally through clicks, scrolls, and navigation patte
arXiv:2602.13516v1 Announce Type: new Abstract: LLM-powered agents are beginning to automate user's tasks across the open web, often with access to user resources such as emails and calendars. Unlike standard LLMs answering questions in a controlled ChatBot setting, web agents act "in the wild", interacting with third parties and leaving behind an action trace. Therefore, we ask the question: how do web agents handle user resources when accomplishing tasks on their behalf across live websites? In this paper, we formalize Natural Agentic Oversharing -- the unintentional disclosure of task-irrelevant user information through an agent trace of actions on the web. We introduce SPILLage, a framework that characterizes oversharing along two dimensions: channel (content vs. behavior) and directness (explicit vs. implicit). This taxonomy reveals a critical blind spot: while prior work focuses on text leakage, web agents also overshare behaviorally through clicks, scrolls, and navigation patterns that can be monitored. We benchmark 180 tasks on live e-commerce sites with ground-truth annotations separating task-relevant from task-irrelevant attributes. Across 1,080 runs spanning two agentic frameworks and three backbone LLMs, we demonstrate that oversharing is pervasive with behavioral oversharing dominates content oversharing by 5x. This effect persists -- and can even worsen -- under prompt-level mitigation. However, removing task-irrelevant information before execution improves task success by up to 17.9%, demonstrating that reducing oversharing improves task success. Our findings underscore that protecting privacy in web agents is a fundamental challenge, requiring a broader view of "output" that accounts for what agents do on the web, not just what they type. Our datasets and code are available at https://github.com/jrohsc/SPILLage.
Executive Summary
The article 'SPILLage: Agentic Oversharing on the Web' explores the unintentional disclosure of user information by LLM-powered agents operating on the web. The study introduces the concept of Natural Agentic Oversharing, which occurs when agents leave behind traces of actions that include task-irrelevant user data. The authors present SPILLage, a framework that categorizes oversharing into content vs. behavior and explicit vs. implicit dimensions. Through extensive benchmarking on e-commerce sites, the study reveals that behavioral oversharing is five times more prevalent than content oversharing and that mitigating oversharing can improve task success rates by up to 17.9%. The findings highlight the critical need for privacy protections that consider both what agents type and what they do on the web.
Key Points
- ▸ Introduction of the concept of Natural Agentic Oversharing.
- ▸ Development of the SPILLage framework to categorize oversharing.
- ▸ Behavioral oversharing is more prevalent than content oversharing.
- ▸ Mitigation of oversharing improves task success rates.
- ▸ Privacy protections must consider both content and behavioral outputs.
Merits
Comprehensive Framework
The SPILLage framework provides a robust and nuanced taxonomy for understanding agentic oversharing, which is crucial for developing effective mitigation strategies.
Empirical Rigor
The study's extensive benchmarking across multiple tasks, agentic frameworks, and LLMs lends significant empirical weight to its findings, making them highly credible.
Practical Implications
The demonstration that reducing oversharing improves task success rates provides a clear incentive for developers to prioritize privacy protections.
Demerits
Limited Scope
The study focuses primarily on e-commerce sites, which may not fully capture the breadth of web agent interactions across different domains.
Mitigation Challenges
While the study shows that prompt-level mitigation can help, it also notes that this approach may not fully address behavioral oversharing, indicating a need for more sophisticated solutions.
Generalizability
The findings may not be fully generalizable to all types of web agents and tasks, as the study's scope is somewhat limited.
Expert Commentary
The study 'SPILLage: Agentic Oversharing on the Web' represents a significant advancement in our understanding of the privacy challenges posed by LLM-powered agents. The introduction of the SPILLage framework is particularly noteworthy, as it provides a comprehensive and nuanced taxonomy for categorizing oversharing. The empirical findings, which demonstrate the prevalence of behavioral oversharing and the benefits of mitigation, are both rigorous and insightful. However, the study's focus on e-commerce sites may limit the generalizability of its findings. Future research should explore the extent to which these findings apply to other domains. Additionally, while the study highlights the potential of prompt-level mitigation, it also underscores the need for more sophisticated solutions to address behavioral oversharing. Overall, this study underscores the critical importance of privacy protections in the development and deployment of web agents, and it provides a valuable foundation for further research in this area.
Recommendations
- ✓ Expand the scope of future studies to include a broader range of web domains to enhance the generalizability of findings.
- ✓ Develop more sophisticated mitigation strategies that specifically target behavioral oversharing to improve privacy protections.