CHiL(L)Grader: Calibrated Human-in-the-Loop Short-Answer Grading
arXiv:2603.11957v1 Announce Type: new Abstract: Scaling educational assessment with large language models requires not just accuracy, but the ability to recognize when predictions are trustworthy. …
Pranav Raikote, Korbinian Randl, Ioanna Miliou, Athanasios Lakes, Panagiotis Papapetrou
9 views