Enhancing Safety of Large Language Models via Embedding Space Separation
arXiv:2603.20206v1 Announce Type: new Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety against harmful prompts remains a critical challenge. Recent …
Xu Zhao, Xiting Wang, Weiran Shen
8 views