MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
arXiv:2603.19310v1 Announce Type: new Abstract: Training large language models (LLMs) for complex reasoning via reinforcement learning requires reward labels that specify whether the generated rollouts …
Tianyang Luo, Tao Feng, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You
6 views