Academic

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

arXiv:2603.10397v1 Announce Type: new Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In \emph{Phase I}, the magnitudes of model weights progressively diminish, and the model escapes the lazy regime; enters the rich regime. In \emph{Phase II}, the alignment between model weights and the ground-truth interpolator increases, and the model eventually converges. Our analysis highlights the critical role of label noise in driving the transition from the lazy to the rich regime and minimally explains its empirical success. Furthermore, we e

Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan · March 12, 2026 · 1 min read · 32 views

#cs.LG #cs.AI

Executive Summary

This article delves into the learning dynamics of two-layer linear networks trained with stochastic gradient descent (SGD) and label noise. The authors identify a two-phase learning behavior: Phase I, where model weights decrease in magnitude and the model transitions from the lazy to the rich regime, and Phase II, where weights align with the ground-truth interpolator and converge. Label noise is shown to drive this transition, explaining its empirical success. The study extends to Sharpness-Aware Minimization (SAM) and is supported by extensive experiments. The findings highlight the importance of label noise in deep learning and its effects on model generalization.

Key Points

▸ The authors analyze the learning dynamics of two-layer linear networks with label noise SGD.
▸ A two-phase learning behavior is identified, with Phase I characterized by decreasing model weights and Phase II by increasing alignment with the ground-truth interpolator.
▸ Label noise is shown to drive the transition from the lazy to the rich regime, explaining its empirical success.

Merits

Strength in Analytical Approach

The authors employ a rigorous analytical approach to understanding the learning dynamics of two-layer linear networks, providing insights into the role of label noise in deep learning.

Empirical Support

The study is supported by extensive experiments under both synthetic and real-world setups, lending credibility to the authors' findings.

Demerits

Limited to Linear Networks

The study's focus on two-layer linear networks may limit its generalizability to more complex neural network architectures.

Assumes Stationarity

The analysis assumes stationarity of the noise process, which may not hold in practice, potentially affecting the results' applicability.

Expert Commentary

The article provides a comprehensive analysis of the learning dynamics of two-layer linear networks with label noise SGD. The identification of a two-phase learning behavior and the role of label noise in driving this transition are significant contributions to the field of deep learning. However, the study's limitations, such as its focus on linear networks and assumption of stationarity, should be considered when interpreting the results. The extension to Sharpness-Aware Minimization (SAM) is a notable aspect of the study, highlighting the broader implications of label noise for optimization algorithms. The findings of this study have the potential to inform the development of more effective training protocols for deep learning models, potentially leading to improved performance in real-world applications.

Recommendations

✓ Future studies should investigate the generalizability of the findings to more complex neural network architectures.
✓ The assumption of stationarity should be relaxed to better capture the non-stationary nature of real-world noise processes.

Sources

arXiv - cs.LG

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

AI Commentary

Executive Summary

Key Points

Merits

Strength in Analytical Approach

Empirical Support

Demerits

Limited to Linear Networks

Assumes Stationarity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs