OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
arXiv:2604.02349v1 Announce Type: cross Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in …