Academic

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun, Changpeng Yang, Yang Li, Peng Zhou, Shuai Nie, Hongzhen Wang, Linfeng Zhou, Hao Jia, Jiaming Xu, Runyu Shi, Ying Huang · March 2, 2026 · 1 min read · 0 views

#cs.AI

arXiv:2602.21858v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complexity and enable objective, executable evaluation. To overcome these challenges, we introduce ProactiveMobile, a comprehensive benchmark designed to systematically advance research in this domain. ProactiveMobile formalizes the proactive task as inferring latent user intent across four dimensions of on-device contextual signals and generating an executable function sequence from a comprehensive function pool of 63 APIs. The benchmark features over 3,660 instances of 14 scenarios that embrace real-world complexity through multi-answer annotations. To ensure quality, a team of 30 experts conducts a final audit of the benchmark, verifying factual accuracy, logical consistency, and action feasibility, and correcting any non-compliant entries. Extensive experiments demonstrate that our fine-tuned Qwen2.5-VL-7B-Instruct achieves a success rate of 19.15%, outperforming o1 (15.71%) and GPT-5 (7.39%). This result indicates that proactivity is a critical competency widely lacking in current MLLMs, yet it is learnable, emphasizing the importance of the proposed benchmark for proactivity evaluation.

Executive Summary

This article introduces ProactiveMobile, a comprehensive benchmark designed to advance research in proactive intelligence on mobile devices. The benchmark formalizes the proactive task as inferring latent user intent and generating executable function sequences from a comprehensive pool of APIs. The authors conduct extensive experiments and achieve significant improvements in success rates over existing models. While the benchmark and results demonstrate the potential of proactive intelligence, there are concerns about its real-world applicability and the complexity of real-world scenarios. The article's findings highlight the importance of evaluating proactive intelligence, but also underscore the need for further research on the limitations and biases of AI models.

Key Points

▸ ProactiveMobile is a comprehensive benchmark for evaluating proactive intelligence on mobile devices.
▸ The benchmark formalizes the proactive task as inferring latent user intent and generating executable function sequences.
▸ The authors achieve significant improvements in success rates over existing models using their fine-tuned Qwen2.5-VL-7B-Instruct model.

Merits

Strength

The ProactiveMobile benchmark provides a systematic and comprehensive evaluation framework for proactive intelligence on mobile devices, filling a critical gap in research. The authors' use of a large pool of APIs and extensive experiments demonstrates the potential of proactive intelligence in real-world scenarios.

Demerits

Limitation

The ProactiveMobile benchmark may not fully capture the complexity of real-world scenarios, particularly in terms of user intent and context. Additionally, the authors' reliance on fine-tuning a pre-existing model may limit the generalizability of their results.

Expert Commentary

The ProactiveMobile benchmark represents a significant step forward in the evaluation and development of proactive intelligence on mobile devices. However, further research is needed to fully capture the complexity of real-world scenarios and to address the limitations and biases of AI models. The article's findings highlight the importance of proactive intelligence in improving user experience and context-awareness, but also underscore the need for caution and careful evaluation in the development and deployment of AI systems.

Recommendations

✓ Future research should focus on developing more comprehensive and nuanced benchmarks for proactive intelligence, particularly in terms of user intent and context.
✓ Developers and policymakers should prioritize the development and deployment of AI systems that are transparent, explainable, and fair, particularly in terms of proactive intelligence and its potential impact on user autonomy and decision-making.

Sources

arXiv - cs.AI

Something extraordinary is coming.

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

AI Commentary

Executive Summary

Key Points

Merits

Strength

Demerits

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.