Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved Convergence
arXiv:2603.05960v1 Announce Type: new Abstract: Memory-efficient optimization methods have recently gained increasing attention for scaling full-parameter training of large language models under the GPU-memory bottleneck. …
Hui Yang, Tao Ren, Jinyang Jiang, Wan Tian, Yijie Peng
17 views