Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks
arXiv:2602.15997v1 Announce Type: new Abstract: Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (wi
arXiv:2602.15997v1 Announce Type: new Abstract: Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not -- the precursor relationship requires task-training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.
Executive Summary
The article 'Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks' investigates the mechanisms behind the emergence of capabilities in neural networks during training. The study tracks five geometric measures across various model scales and tasks, revealing universal representation collapse, top-down propagation of collapse, and a geometric hierarchy leading emergence. The research highlights the limitations of geometric measures in predicting fine-grained timing and task-specific precursors in naturalistic pre-training.
Key Points
- ▸ Universal representation collapse to task-specific floors that are scale-invariant across a wide range of model sizes.
- ▸ Top-down propagation of collapse through layers, contradicting the traditional bottom-up feature-building intuition.
- ▸ Geometric hierarchy where representation geometry leads emergence, with local learning coefficient and Hessian measures lagging.
Merits
Comprehensive Analysis
The study provides a rigorous and comprehensive analysis of capability emergence in neural networks, tracking multiple geometric measures across various model scales and tasks.
Novel Findings
The findings challenge traditional intuitions about feature-building in neural networks, offering new insights into the mechanisms of capability emergence.
Demerits
Limited Predictive Power
The study acknowledges the limitations of geometric measures in predicting fine-grained timing and task-specific precursors, which may limit their practical applicability.
Naturalistic Pre-Training Limitations
The precursor relationship observed in task-training alignment does not necessarily translate to naturalistic pre-training, indicating potential gaps in real-world applicability.
Expert Commentary
The article presents a significant advancement in the field of neural network research by elucidating the mechanisms behind capability emergence. The rigorous tracking of geometric measures across various model scales and tasks provides a nuanced understanding of the training dynamics. The finding that representation collapse propagates top-down challenges the conventional bottom-up feature-building hypothesis, suggesting a more complex interplay of factors in neural network training. However, the study's acknowledgment of the limitations in predictive power and the gaps in naturalistic pre-training highlights the need for further research. The implications of this work are profound, offering practical insights for optimizing training processes and informing policy decisions related to AI development. The study's balanced approach, combining empirical evidence with theoretical insights, makes it a valuable contribution to the field.
Recommendations
- ✓ Further research should explore the mechanisms underlying the top-down propagation of representation collapse to validate and expand upon the current findings.
- ✓ Investigations into the predictive power of geometric measures in naturalistic pre-training settings are recommended to bridge the gap between controlled experiments and real-world applications.