Can LLM Safety Be Ensured by Constraining Parameter Regions?
arXiv:2602.17696v1 Announce Type: cross Abstract: Large language models (LLMs) are often assumed to contain ``safety regions'' -- parameter subsets whose modification directly influences safety behaviors. …
Zongmin Li, Jian Su, Farah Benamara, Aixin Sun
4 views