Academic

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Fanqi Kong, Jiayi Zhang, Mingyi Deng, Chenglin Wu, Yuyu Luo, Bang Liu · March 7, 2026 · 1 min read · 29 views

#cs.AI

arXiv:2603.00656v1 Announce Type: new Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

Executive Summary

The article introduces InfoPO, a novel approach to policy optimization for user-centric agents, addressing the issue of underspecified user requests. InfoPO frames multi-turn interaction as active uncertainty reduction, computing an information-gain reward that credits valuable interaction turns. It outperforms baselines across diverse tasks, demonstrating robustness and generalizability. InfoPO provides a principled mechanism for optimizing complex agent-user collaboration, with code available for further development.

Key Points

▸ InfoPO addresses underspecified user requests through active uncertainty reduction
▸ It computes an information-gain reward to credit valuable interaction turns
▸ InfoPO outperforms prompting and multi-turn RL baselines across diverse tasks

Merits

Effective Handling of Uncertainty

InfoPO's ability to frame multi-turn interaction as active uncertainty reduction enables it to effectively handle underspecified user requests

Improved Performance

InfoPO's information-gain reward and adaptive variance-gated fusion lead to improved performance across diverse tasks

Demerits

Computational Complexity

InfoPO's computation of information-gain rewards and adaptive variance-gated fusion may introduce additional computational complexity

Limited Real-World Evaluation

The article's evaluation is primarily based on simulated environments, which may not fully capture real-world complexities

Expert Commentary

The introduction of InfoPO represents a significant advancement in the field of user-centric agents, as it provides a principled mechanism for optimizing complex agent-user collaboration. The approach's ability to frame multi-turn interaction as active uncertainty reduction and compute an information-gain reward is particularly noteworthy. However, further research is needed to fully explore the implications of InfoPO and its potential applications in real-world settings. Additionally, the development of InfoPO highlights the importance of considering the human-AI collaboration aspect in AI system design, which is crucial for ensuring that these systems are effective, efficient, and safe.

Recommendations

✓ Further evaluation of InfoPO in real-world environments to assess its performance and robustness
✓ Investigation into the potential applications of InfoPO in areas beyond user-centric agents, such as decision support systems or autonomous vehicles

Sources

arXiv - cs.AI

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

AI Commentary

Executive Summary

Key Points

Merits

Effective Handling of Uncertainty

Improved Performance

Demerits

Computational Complexity

Limited Real-World Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs