Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning
arXiv:2603.03530v1 Announce Type: new Abstract: Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with
arXiv:2603.03530v1 Announce Type: new Abstract: Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference. Empirically, across SSL objectives, directional CDNV collapses during pretraining even when classical CDNV remains large, and our bounds closely track few-shot error at practical shot sizes. Additionally, on synthetic multitask data, we verify that SSL learns representations whose induced decision axes are nearly orthogonal. The code and project page of the paper are available at [\href{https://dlfundamentals.github.io/directional-neural-collapse/}{project page}].
Executive Summary
The article introduces the concept of directional neural collapse, a geometric quantity that explains the favorable behaviors of self-supervised learning representations. It demonstrates that when variability along class-separating directions is small, strong few-shot transfer within a task and low interference across many tasks emerge. The authors provide non-asymptotic multiclass generalization bounds and link decision-axis collapse to multitask geometry, showing that small directional CDNV forces decision axes to be nearly orthogonal.
Key Points
- ▸ Directional neural collapse explains few-shot transfer in self-supervised learning
- ▸ Small variability along class-separating directions leads to strong few-shot transfer and low interference
- ▸ Decision-axis collapse is linked to multitask geometry, allowing a single representation to support many tasks
Merits
Theoretical Foundations
The article provides a rigorous theoretical framework for understanding directional neural collapse
Empirical Validation
The authors provide empirical evidence to support their claims, including experiments on synthetic multitask data
Demerits
Limited Scope
The article focuses primarily on self-supervised learning and may not be directly applicable to other areas of machine learning
Complexity
The mathematical framework and concepts presented may be challenging for non-experts to understand
Expert Commentary
The article presents a significant contribution to the understanding of self-supervised learning and its applications. The concept of directional neural collapse provides a valuable insight into the geometric properties of representations learned through self-supervision. The authors' theoretical and empirical contributions demonstrate the importance of considering the directional properties of representations in the context of few-shot transfer and multitask learning. Overall, the article has important implications for the development of more effective and efficient machine learning algorithms.
Recommendations
- ✓ Further research on the applications of directional neural collapse in other areas of machine learning
- ✓ Investigation of the relationship between directional neural collapse and other geometric properties of representations