Skip to main content
Academic

Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning

arXiv:2602.16167v1 Announce Type: new Abstract: Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while prese

B
Binghang Lu, Jiahao Zhang, Guang Lin
· · 1 min read · 3 views

arXiv:2602.16167v1 Announce Type: new Abstract: Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. This formulation interprets optimization as a multi-mode gradient flow and enables principled control of stiff spectral components. We establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments on physics-informed neural networks, DeepONets, and fractional PINN-DeepONets demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems such as the one-dimensional Burgers equation and fractional partial differential equations.

Executive Summary

This article proposes SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. The authors establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems.

Key Points

  • SpecMuon combines orthogonalized geometry with RSAV updates to adaptively regulate step sizes.
  • SpecMuon preserves Muon's scale-balancing properties while enabling principled control of stiff spectral components.
  • Numerical experiments demonstrate that SpecMuon achieves faster convergence and improved stability on benchmark problems.

Merits

Strength

SpecMuon's integration of orthogonalized geometry and RSAV updates enables rigorous theoretical properties and improved practical performance.

Demerits

Limitation

The authors' focus on physics-informed neural networks and neural operators may limit SpecMuon's applicability to other machine learning domains.

Expert Commentary

The article's contribution to the field of optimization for machine learning is significant, as SpecMuon addresses the long-standing challenge of optimizing physics-informed neural networks and neural operators. The authors' rigorous theoretical properties and numerical experiments provide strong evidence for SpecMuon's effectiveness. However, the article's focus on a specific domain and the potential limitations of SpecMuon's applicability to other machine learning domains suggest that further research is needed to generalize these results. Overall, SpecMuon represents a promising development in the field of optimization for machine learning, and its implications for practical and policy-oriented applications are substantial.

Recommendations

  • Future research should investigate the generalizability of SpecMuon to other machine learning domains and its potential applications beyond physics-informed learning.
  • Researchers should explore the use of spectral methods in optimization for other challenging machine learning problems, such as those involving high-dimensional data or non-convex losses.

Sources