HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents
arXiv:2602.16165v1 Announce Type: new Abstract: Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where …
Jiangweizhi Peng, Yuanxin Liu, Ruida Zhou, Charles Fleming, Zhaoran Wang, Alfredo Garcia, Mingyi Hong
5 views