Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration
arXiv:2603.23889v1 Announce Type: new Abstract: When safety is formulated as a limit of cumulative cost, safe reinforcement learning (RL) aims to learn policies that maximize …
Guopeng Li, Matthijs T. J. Spaan, Julian F. P. Kooij
9 views