Support Tokens, Stability Margins, and a New Foundation for Robust LLMs
arXiv:2602.22271v1 Announce Type: new Abstract: Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret …