This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Robin Young

Articles by Robin Young

Academic · 1 min

Why Is RLHF Alignment Shallow? A Gradient Analysis

arXiv:2603.04851v1 Announce Type: new Abstract: Why is safety alignment in LLMs shallow? We prove that gradient-based alignment inherently concentrates on positions where harm is decided …

Robin Young

18 views Mar 7

Robin Young

Articles by Robin Young

Why Is RLHF Alignment Shallow? A Gradient Analysis

JCG, PC

HSOLLC Co., Ltd.