Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
arXiv:2602.22585v1 Announce Type: new Abstract: Human evaluations play a central role in training and assessing AI models, yet these data are rarely treated as measurements …
Jodi M. Casabianca, Maggie Beiting-Parrish
4 views