Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error
arXiv:2604.01613v1 Announce Type: new Abstract: In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the …
Taisuke Kobayashi
8 views