Why Do Neural Networks Forget: A Study of Collapse in Continual Learning
arXiv:2603.04580v1 Announce Type: new Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD, Learning-without-Forgetting (LwF), and Experience Replay (ER) strategies s
arXiv:2603.04580v1 Announce Type: new Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD, Learning-without-Forgetting (LwF), and Experience Replay (ER) strategies separately. The results demonstrate that forgetting and collapse are strongly related, and different continual learning strategies help models preserve both capacity and performance in different efficiency.
Executive Summary
This article investigates the correlation between catastrophic forgetting and structural collapse in neural networks during continual learning. The authors propose a novel approach to measure the link between forgetting and collapse through the measurement of both weight and activation eRank. The study evaluates four architectures on two benchmarks, demonstrating that different continual learning strategies help models preserve capacity and performance. The findings highlight the importance of understanding the internal model structure in addressing forgetting. The study's results have significant implications for the development of effective continual learning strategies, which are crucial for real-world applications. The authors' methodological approach provides a valuable framework for future research in this area.
Key Points
- ▸ Catastrophic forgetting is a major problem in continual learning, and understanding its relationship with structural collapse is essential.
- ▸ The authors propose a novel approach to measure the link between forgetting and collapse through eRank.
- ▸ Different continual learning strategies (LwF, ER, SGD) help models preserve capacity and performance in different efficiency.
Merits
Strength
The study provides a comprehensive analysis of the link between catastrophic forgetting and structural collapse, shedding light on the internal model structure's role in addressing forgetting.
Demerits
Limitation
The study is limited to evaluating four architectures on two benchmarks, which may not be representative of the diverse range of neural network architectures and tasks.
Expert Commentary
The study's novel approach to measuring the link between catastrophic forgetting and structural collapse provides a valuable framework for future research in this area. However, the study's limitations, such as its focus on a limited range of architectures and benchmarks, highlight the need for further research to generalize the findings. The study's implications for the development of effective continual learning strategies are significant, and its results have the potential to improve the performance and adaptability of neural networks in real-world applications.
Recommendations
- ✓ Future research should focus on evaluating a broader range of architectures and benchmarks to generalize the study's findings.
- ✓ The study's methodological approach should be further developed and refined to provide a more comprehensive understanding of the link between catastrophic forgetting and structural collapse.