P

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai

Articles by Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai

Academic · 1 min

Partial Policy Gradients for RL in LLMs

arXiv:2603.06138v1 Announce Type: new Abstract: Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for …

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai
9 views