Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms
arXiv:2603.09090v1 Announce Type: new Abstract: In reinforcement learning environments with state-dependent action validity, action masking consistently outperforms penalty-based handling of invalid actions, yet existing theory …
Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. Sycara
11 views