Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
arXiv:2602.21585v1 Announce Type: new Abstract: Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete …
Sweta Karlekar, Carolina Zheng, Magnus Saebo, Nicolas Beltran-Velez, Shuyang Yu, John Bowlan, Michal Kucer, David Blei
3 views