Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
arXiv:2603.18088v1 Announce Type: new Abstract: Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization …
Hao Ma, Zhiqiang Pu, Yang Liu, Xiaolin Ai
6 views