C

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Articles by Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Academic · 1 min

Prompt Injection as Role Confusion

arXiv:2603.12277v1 Announce Type: cross Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models …

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
13 views