Skip to main content
Academic

Modeling Distinct Human Interaction in Web Agents

arXiv:2602.17588v1 Announce Type: new Abstract: Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in inte

arXiv:2602.17588v1 Announce Type: new Abstract: Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in intervention prediction accuracy over base LMs. Finally, we deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness. Together, our results show structured modeling of human intervention leads to more adaptive, collaborative agents.

Executive Summary

This article presents a significant breakthrough in the development of autonomous web agents by incorporating a structured understanding of human interaction. By analyzing a dataset of real-user web navigation trajectories, the authors identify four distinct patterns of user interaction with agents and train language models to anticipate when users are likely to intervene. The results demonstrate a substantial improvement in intervention prediction accuracy and agent usefulness in a user study. This research has far-reaching implications for the development of more collaborative, adaptive, and effective web agents.

Key Points

  • The authors introduce a novel dataset, CowCorpus, containing 400 real-user web navigation trajectories with over 4,200 interleaved human and agent actions.
  • Four distinct patterns of user interaction with agents are identified: hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover.
  • Language models are trained to anticipate when users are likely to intervene based on their interaction styles, yielding a significant improvement in intervention prediction accuracy.

Merits

Strength in Pattern Identification

The authors provide a comprehensive analysis of human interaction with agents, identifying four distinct patterns that were previously unknown. This insight enables the development of more effective web agents that can adapt to different user interaction styles.

Improvement in Intervention Prediction Accuracy

The trained language models demonstrate a substantial improvement in intervention prediction accuracy, which is a critical aspect of developing more collaborative web agents.

Demerits

Limited Generalizability

The study's findings may not be generalizable to other domains or applications, as the dataset and models are tailored to web navigation tasks.

Need for Further Validation

While the results are promising, further validation and testing are necessary to ensure the robustness and reliability of the models in real-world settings.

Expert Commentary

This article represents a significant step forward in the development of autonomous web agents. By incorporating a structured understanding of human interaction, the authors have demonstrated a substantial improvement in intervention prediction accuracy and agent usefulness. The findings have far-reaching implications for the development of more collaborative, adaptive, and effective web agents. While the study's limitations should be acknowledged, the research has the potential to transform the field of human-computer interaction and artificial intelligence. As the authors note, the development of more effective web agents requires a deeper understanding of human interaction, and this study provides valuable insights in this regard.

Recommendations

  • Future research should focus on developing more generalizable models that can be applied across different domains and applications.
  • The authors should conduct further validation and testing to ensure the robustness and reliability of the models in real-world settings.

Sources