Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
arXiv:2603.05900v1 Announce Type: new Abstract: Large language models (LLMs) benefit substantially from supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR) in reasoning tasks. …
Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han
9 views