IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error accumulation and inefficiency. We present IntentCUA, a multi-agent computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory. A Planner, Plan-Optimizer, and Critic coordinate over shared memory that abstracts raw interaction traces into multi-view intent representations and reusable skills. At runtime, intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning and mitigating error propagation across desktop applications. In end-to-end evaluations, IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and traj
arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error accumulation and inefficiency. We present IntentCUA, a multi-agent computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory. A Planner, Plan-Optimizer, and Critic coordinate over shared memory that abstracts raw interaction traces into multi-view intent representations and reusable skills. At runtime, intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning and mitigating error propagation across desktop applications. In end-to-end evaluations, IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and trajectory-centric baselines. Ablations show that multi-view intent abstraction and shared plan memory jointly improve execution stability, with the cooperative multi-agent loop providing the largest gains on long-horizon tasks. These results highlight that system-level intent abstraction and memory-grounded coordination are key to reliable and efficient desktop automation in large, dynamic environments.
Executive Summary
The article introduces IntentCUA, a novel multi-agent framework for computer-use agents that stabilizes long-horizon execution through intent-aligned plan memory. By abstracting raw interaction traces into multi-view intent representations and reusable skills, IntentCUA achieves a 74.83% task success rate and outperforms existing baselines. The framework's cooperative multi-agent loop and shared plan memory are key to reliable and efficient desktop automation in large, dynamic environments.
Key Points
- ▸ IntentCUA framework for computer-use agents
- ▸ Intent-aligned plan memory for long-horizon execution
- ▸ Multi-view intent representations and reusable skills
Merits
Improved Task Success Rate
IntentCUA achieves a higher task success rate compared to existing baselines, demonstrating its effectiveness in stabilizing long-horizon execution.
Efficient Execution
The framework's shared plan memory and cooperative multi-agent loop reduce redundant re-planning and mitigate error propagation, resulting in more efficient execution.
Demerits
Complexity
The IntentCUA framework may introduce additional complexity due to its multi-agent architecture and shared plan memory, potentially making it more challenging to implement and maintain.
Expert Commentary
The IntentCUA framework represents a significant advancement in the field of computer-use agents, demonstrating the potential of intent-aligned plan memory and multi-agent planning to improve the efficiency and reliability of autonomous systems. The framework's ability to abstract raw interaction traces into multi-view intent representations and reusable skills is particularly noteworthy, as it enables the system to adapt to changing environments and user needs. However, further research is needed to fully realize the potential of IntentCUA and to address the potential challenges and limitations associated with its implementation.
Recommendations
- ✓ Further research should be conducted to explore the applicability of IntentCUA to various domains and to investigate the potential benefits and challenges of its implementation.
- ✓ Developers and policymakers should consider the potential implications of IntentCUA and other autonomous systems on employment, the economy, and society as a whole, and should work to ensure that these systems are designed and deployed in a responsible and transparent manner.