ABCD: All Biases Come Disguised
arXiv:2602.17445v1 Announce Type: new Abstract: Multiple-choice question (MCQ) benchmarks have been a standard evaluation practice for measuring LLMs' ability to reason and answer knowledge-based questions. …
Mateusz Nowak, Xavier Cadet, Peter Chin
15 views