Discovering Universal Activation Directions for PII Leakage in Language Models
arXiv:2602.16980v1 Announce Type: new Abstract: Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information …
Leo Marchyok, Zachary Coalson, Sungho Keum, Sooel Son, Sanghyun Hong
6 views