Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
arXiv:2603.11149v1 Announce Type: new Abstract: Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales …
Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran
14 views