Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
arXiv:2603.03778v1 Announce Type: new Abstract: We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, …
Yuqi Kong, Xiao Zhang, Weiran Shen
10 views