A unified theory of feature learning in RNNs and DNNs
arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($\mu$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, th
arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($\mu$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, the RNNs' weight sharing furthermore induces an inductive bias that aids generalization by interpolating unsupervised time steps. Overall, our theory offers a way to connect architectural structure to functional biases.
Executive Summary
This article proposes a unified theory of feature learning in Recurrent Neural Networks (RNNs) and Deep Neural Networks (DNNs) by developing a mean-field theory that describes fully trained networks in the feature learning regime. The theory reveals the functional implications of RNNs' weight sharing and identifies a phase transition in DNN-typical tasks where RNNs develop correlated representations across timesteps. The theory also highlights an inductive bias in RNNs that aids generalization by interpolating unsupervised time steps. This work offers a significant contribution to understanding the connection between architectural structure and functional biases in neural networks.
Key Points
- ▸ Development of a unified mean-field theory for RNNs and DNNs in feature learning regime
- ▸ Identification of a phase transition in DNN-typical tasks where RNNs develop correlated representations
- ▸ Inductive bias in RNNs that aids generalization by interpolating unsupervised time steps
Merits
Strength
Provides a unified framework for understanding feature learning in RNNs and DNNs
Strength
Offers a clear explanation of the functional implications of RNNs' weight sharing
Strength
Highlights the inductive bias in RNNs that aids generalization
Demerits
Limitation
The theory is developed in the feature learning regime, which may not be applicable to all neural network architectures
Limitation
The phase transition identified in DNN-typical tasks may not be generalizable to other tasks
Limitation
The inductive bias in RNNs may not be applicable to all sequential tasks
Expert Commentary
This article makes a significant contribution to the field of neural networks by providing a unified theory of feature learning in RNNs and DNNs. The mean-field theory developed in this work offers a clear explanation of the functional implications of RNNs' weight sharing and highlights the inductive bias in RNNs that aids generalization. While there are some limitations to the theory, such as its applicability to the feature learning regime and the phase transition identified in DNN-typical tasks, the work has the potential to significantly impact the development of more effective neural network architectures. The implications of this work are far-reaching and have the potential to improve the performance of machine learning models in a variety of applications.
Recommendations
- ✓ Recommendation 1: Further research should be conducted to extend the theory to other neural network architectures
- ✓ Recommendation 2: The inductive bias in RNNs should be explored in more depth to understand its potential applications