Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov …
arXiv:2602.16274v1 Announce Type: new Abstract: We present the first high-probability regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes, without relying on …
Rahul Singh, Siddharth Chandak, Eric Moulines, Vivek S. Borkar, Nicholas Bambos
4 views