$\kappa$-Explorer: A Unified Framework for Active Model Estimation in MDPs
arXiv:2602.20404v1 Announce Type: new Abstract: In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned …