This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Rahul Singh, Siddharth Chandak, Eric Moulines, Vivek S. Borkar, Nicholas Bambos

Articles by Rahul Singh, Siddharth Chandak, Eric Moulines, Vivek S. Borkar, Nicholas Bambos

Academic · 1 min

Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov …

arXiv:2602.16274v1 Announce Type: new Abstract: We present the first high-probability regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes, without relying on …

4 views Feb 20

Something extraordinary is coming.

Rahul Singh, Siddharth Chandak, Eric Moulines, Vivek S. Borkar, Nicholas Bambos

Articles by Rahul Singh, Siddharth Chandak, Eric Moulines, Vivek S. Borkar, Nicholas Bambos

Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov …

JCG, PC

HSOLLC Co., Ltd.