Robust Regularized Policy Iteration under Transition Uncertainty
arXiv:2603.09344v1 Announce Type: new Abstract: Offline reinforcement learning (RL) enables data-efficient and safe policy learning without online exploration, but its performance often degrades under distribution …
Hongqiang Lin, Zhenghui Fu, Weihao Tang, Pengfei Wang, Yiding Sun, Qixian Huang, Dongxu Zhang
66 views