Optimal low-rank stochastic gradient estimation for LLM training
arXiv:2603.20632v1 Announce Type: new Abstract: Large language model (LLM) training is often bottlenecked by memory constraints and stochastic gradient noise in extremely high-dimensional parameter spaces. …
Zehao Li, Tao Ren, Zishi Zhang, Xi Chen, Yijie Peng
7 views