This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Gregory N. Frank

Articles by Gregory N. Frank

Academic · 1 min

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

arXiv:2603.18280v1 Announce Type: new Abstract: Current alignment evaluation mostly measures whether models encode dangerous concepts and whether they refuse harmful requests. Both miss the layer …

Gregory N. Frank

5 views Mar 20

Gregory N. Frank

Articles by Gregory N. Frank

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

JCG, PC

HSOLLC Co., Ltd.