Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
arXiv:2603.18326v1 Announce Type: new Abstract: While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore …
Amirhossein Roknilamouki, Arnob Ghosh, Eylem Ekici, Ness B. Shroff
6 views