This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Hadi Khalaf, Serena L. Wang, Daniel Halpern, Itai Shapira, Flavio du Pin Calmon, Ariel D. Procaccia

Articles by Hadi Khalaf, Serena L. Wang, Daniel Halpern, Itai Shapira, Flavio du Pin Calmon, Ariel D. Procaccia

Academic · 1 min

Robust AI Evaluation through Maximal Lotteries

arXiv:2602.21297v1 Announce Type: new Abstract: The standard way to evaluate language models on subjective tasks is through pairwise comparisons: an annotator chooses the "better" of …

3 views Feb 27

Something extraordinary is coming.

Hadi Khalaf, Serena L. Wang, Daniel Halpern, Itai Shapira, Flavio du Pin Calmon, Ariel D. Procaccia

Articles by Hadi Khalaf, Serena L. Wang, Daniel Halpern, Itai Shapira, Flavio du Pin Calmon, Ariel D. Procaccia

Robust AI Evaluation through Maximal Lotteries

JCG, PC

HSOLLC Co., Ltd.