Oracle-Robust Online Alignment for Large Language Models
arXiv:2602.20457v1 Announce Type: new Abstract: We study online alignment of large language models under misspecified preference feedback, where the observed preference oracle deviates from an …
Zimeng Li, Mudit Gaur, Vaneet Aggarwal
4 views