Reward Prediction with Factorized World States
arXiv:2603.09400v1 Announce Type: new Abstract: Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to …
Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung
32 views