A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness
arXiv:2603.06594v1 Announce Type: new Abstract: Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For instance, in safety …
Leo Schwinn, Moritz Ladenburger, Tim Beyer, Mehrnaz Mofakhami, Gauthier Gidel, Stephan G\"unnemann
83 views