Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces
arXiv:2603.10199v1 Announce Type: new Abstract: Policy Dual Averaging (PDA) offers a principled Policy Mirror Descent (PMD) framework that more naturally admits value function approximation than …
Ji Gao, Caleb Ju, Guanghui Lan, Zhaohui Tong
16 views