Optimal Regret for Policy Optimization in Contextual Bandits
arXiv:2602.13700v1 Announce Type: new Abstract: We present the first high-probability optimal regret bound for a policy optimization technique applied to the problem of stochastic contextual …
Orin Levy, Yishay Mansour
4 views