Closing the Distribution Gap in Adversarial Training for LLMs
arXiv:2602.15238v1 Announce Type: new Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant …
Chengzhi Hu, Jonas Dornbusch, David L\"udke, Stephan G\"unnemann, Leo Schwinn
7 views