MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue
arXiv:2603.06194v1 Announce Type: new Abstract: Subjective multi-turn dialogue tasks, such as emotional support, require conversational policies that adapt to evolving user states and optimize long-horizon …
Naifan Zhang, Ruihan Sun, Jinwei Su, Hengjie Yang, Zhengyuan Pan, Zhaohan Chen, Xiaofan Zhang
14 views