TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers
arXiv:2602.13498v1 Announce Type: new Abstract: Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization …
Peng Cheng, Jiucheng Zang, Qingnan Li, Liheng Ma, Yufei Cui, Yingxue Zhang, Boxing Chen, Ming Jian, Wen Tong
4 views