Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
arXiv:2602.24009v1 Announce Type: cross Abstract: Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across …
Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
19 views