Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling
arXiv:2602.15972v1 Announce Type: new Abstract: We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered. Such …
Tianchi Zhao, He Liu, Hongyin Shi, Jinliang Li
6 views