Skip to main content
Academic

A unified theory of feature learning in RNNs and DNNs

arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($\mu$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, th

J
Jan P. Bauer, Kirsten Fischer, Moritz Helias, Agostina Palmigiano
· · 1 min read · 6 views

arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($\mu$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, the RNNs' weight sharing furthermore induces an inductive bias that aids generalization by interpolating unsupervised time steps. Overall, our theory offers a way to connect architectural structure to functional biases.

Executive Summary

This article proposes a unified theory of feature learning in Recurrent Neural Networks (RNNs) and Deep Neural Networks (DNNs) by developing a mean-field theory that describes fully trained networks in the feature learning regime. The theory reveals the functional implications of RNNs' weight sharing and identifies a phase transition in DNN-typical tasks where RNNs develop correlated representations across timesteps. The theory also highlights an inductive bias in RNNs that aids generalization by interpolating unsupervised time steps. This work offers a significant contribution to understanding the connection between architectural structure and functional biases in neural networks.

Key Points

  • Development of a unified mean-field theory for RNNs and DNNs in feature learning regime
  • Identification of a phase transition in DNN-typical tasks where RNNs develop correlated representations
  • Inductive bias in RNNs that aids generalization by interpolating unsupervised time steps

Merits

Strength

Provides a unified framework for understanding feature learning in RNNs and DNNs

Strength

Offers a clear explanation of the functional implications of RNNs' weight sharing

Strength

Highlights the inductive bias in RNNs that aids generalization

Demerits

Limitation

The theory is developed in the feature learning regime, which may not be applicable to all neural network architectures

Limitation

The phase transition identified in DNN-typical tasks may not be generalizable to other tasks

Limitation

The inductive bias in RNNs may not be applicable to all sequential tasks

Expert Commentary

This article makes a significant contribution to the field of neural networks by providing a unified theory of feature learning in RNNs and DNNs. The mean-field theory developed in this work offers a clear explanation of the functional implications of RNNs' weight sharing and highlights the inductive bias in RNNs that aids generalization. While there are some limitations to the theory, such as its applicability to the feature learning regime and the phase transition identified in DNN-typical tasks, the work has the potential to significantly impact the development of more effective neural network architectures. The implications of this work are far-reaching and have the potential to improve the performance of machine learning models in a variety of applications.

Recommendations

  • Recommendation 1: Further research should be conducted to extend the theory to other neural network architectures
  • Recommendation 2: The inductive bias in RNNs should be explored in more depth to understand its potential applications

Sources