Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction
arXiv:2602.13321v1 Announce Type: new Abstract: Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe or …
Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli, Laurah Turner, Kelly Cohen
4 views