Academic

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

arXiv:2602.24142v1 Announce Type: new Abstract: Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabili

arXiv:2602.24142v1 Announce Type: new Abstract: Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabilities. To mitigate error propagation in hybrid-capabilities reasoning, we propose InfoGain-Driven DPO (Info-DPO), which uses information gain to evaluate the contribution of each intermediate step, thereby guiding CoME toward more informative reasoning. Comprehensive experiments show that CoME outperforms dense mobile agents and MoE methods on both AITZ and AMEX datasets.

Executive Summary

The article introduces Channel-of-Mobile-Experts (CoME), a novel agent architecture that empowers mobile agents with hybrid-capabilities reasoning. CoME consists of four distinct experts, each aligned with a specific reasoning stage, and is trained using a progressive strategy to enhance and balance its capabilities. The architecture is designed to mitigate error propagation and achieve more informative reasoning. Comprehensive experiments demonstrate CoME's superior performance over existing methods on AITZ and AMEX datasets.

Key Points

  • Introduction of CoME, a novel agent architecture for hybrid-capabilities reasoning
  • Progressive training strategy for decoupling and enhancing expert capabilities
  • InfoGain-Driven DPO (Info-DPO) for mitigating error propagation

Merits

Improved Performance

CoME outperforms existing methods on AITZ and AMEX datasets

Enhanced Reasoning

CoME's hybrid-capabilities reasoning enables more accurate and informative decision-making

Demerits

Complexity

CoME's architecture and training strategy may be complex and challenging to implement

Limited Generalizability

CoME's performance may not generalize to other datasets or domains

Expert Commentary

The introduction of CoME marks a significant advancement in the field of artificial intelligence, particularly in the development of mobile agents. The architecture's ability to empower hybrid-capabilities reasoning and mitigate error propagation is a notable achievement. However, further research is needed to fully explore CoME's potential and address the challenges associated with its complexity and limited generalizability. The implications of CoME's development and deployment are far-reaching, and it is essential to consider the ethical and regulatory concerns that may arise.

Recommendations

  • Further research on CoME's architecture and training strategy to improve its performance and generalizability
  • Investigation into the ethical and regulatory implications of CoME's development and deployment

Sources