Skip to main content
Academic

World-Model-Augmented Web Agents with Action Correction

arXiv:2602.15384v1 Announce Type: new Abstract: Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware

arXiv:2602.15384v1 Announce Type: new Abstract: Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware resilient task execution, we introduce a two-stage deduction chain. A world model, specialized in environmental state transitions, simulates action outcomes, which a judge model then scrutinizes to trigger action corrective feedback when necessary. Experiments show that WAC achieves absolute gains of 1.8% on VisualWebArena and 1.3% on Online-Mind2Web.

Executive Summary

This article proposes a novel web agent framework, WAC, to address the challenges of automating web tasks using large language models. By integrating model collaboration, consequence simulation, and feedback-driven action refinement, WAC aims to provide a more reliable and resilient task execution. The framework employs a multi-agent collaboration process, leveraging a world model as a web-environment expert for strategic guidance. Additionally, a two-stage deduction chain enables risk-aware action correction. Experimental results demonstrate the effectiveness of WAC in achieving absolute gains on two benchmark tasks. The proposed framework has the potential to improve the accuracy and efficiency of web agents in real-world applications, such as web scraping and browsing. However, its limitations in handling dynamic and uncertain environments need to be further explored.

Key Points

  • WAC integrates model collaboration, consequence simulation, and feedback-driven action refinement to improve task execution reliability and resilience.
  • A world model serves as a web-environment expert for strategic guidance in the multi-agent collaboration process.
  • A two-stage deduction chain enables risk-aware action correction by simulating action outcomes and scrutinizing them with a judge model.

Merits

Strength in Task Execution

WAC's multi-agent collaboration process and two-stage deduction chain enable more accurate and reliable task execution, achieving absolute gains on benchmark tasks.

Demerits

Limitation in Handling Dynamic Environments

WAC may struggle to adapt to dynamic and uncertain environments, where the world model's predictions may not accurately reflect the actual web environment.

Expert Commentary

WAC represents a significant advancement in web agent development, as it addresses critical challenges in task execution reliability and resilience. However, its limitations in handling dynamic environments need to be addressed through further research and development. Additionally, the implications of WAC's applications in real-world scenarios, such as data privacy and security concerns, require careful consideration.

Recommendations

  • Future research should focus on improving WAC's adaptability to dynamic environments, potentially through the integration of more advanced world models or uncertainty estimation techniques.
  • Developers and policymakers should prioritize the development of standards and regulations for AI-powered web agents, ensuring their safe and responsible deployment.

Sources