Lost in Backpropagation: The LM Head is a Gradient Bottleneck
arXiv:2603.10145v1 Announce Type: new Abstract: The last layer of neural language models (LMs) projects output features of dimension $D$ to logits in dimension $V$, the …
Nathan Godey, Yoav Artzi
15 views