TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation
arXiv:2603.00025v1 Announce Type: new Abstract: Direct Preference Optimization is an offline post-SFT method for aligning language models from preference pairs, with strong results in instruction …
Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah R. Lowe, Aimee Kendall Roundtree
3 views