Skip to main content
Z

Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong

Articles by Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong

Academic · 1 min

Fail-Closed Alignment for Large Language Models

arXiv:2602.16977v1 Announce Type: new Abstract: We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches …

Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong
16 views