Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models
arXiv:2602.19101v1 Announce Type: new Abstract: Value alignment of Large Language Models (LLMs) requires us to empirically measure these models' actual, acquired representation of value. Among …
Seong Hah Cho, Junyi Li, Anna Leshinskaya
4 views