On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
arXiv:2602.15322v1 Announce Type: new Abstract: Training large language models (LLMs) relies almost exclusively on dense adaptive optimizers with increasingly sophisticated preconditioners. We challenge this by …
Taejong Joo, Wenhan Xia, Cheolmin Kim, Ming Zhang, Eugene Ie
4 views