Near-Optimal Regret for Policy Optimization in Contextual MDPs with General Offline Function Approximation
arXiv:2602.13706v1 Announce Type: new Abstract: We introduce \texttt{OPO-CMDP}, the first policy optimization algorithm for stochastic Contextual Markov Decision Process (CMDPs) under general offline function approximation. …
Orin Levy, Aviv Rosenberg, Alon Cohen, Yishay Mansour
4 views