NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces
arXiv:2603.00180v1 Announce Type: new Abstract: Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions while having widely different, unaligned parameterizations. In this work, we introduce Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields. We establish that Graph HyperNetworks (GHNs) with a convolutional neural network (CNN) decoder structurally align the weight space, creating the local correlation necessary for patch-based processing. Focusing on MLPs, where permutation symmetry is especially apparent, NNiT generates fully functional networks across a range of architectures. Our approach jointly models discrete architecture
arXiv:2603.00180v1 Announce Type: new Abstract: Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions while having widely different, unaligned parameterizations. In this work, we introduce Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields. We establish that Graph HyperNetworks (GHNs) with a convolutional neural network (CNN) decoder structurally align the weight space, creating the local correlation necessary for patch-based processing. Focusing on MLPs, where permutation symmetry is especially apparent, NNiT generates fully functional networks across a range of architectures. Our approach jointly models discrete architecture tokens and continuous weight patches within a single sequence model. On ManiSkill3 robotics tasks, NNiT achieves >85% success on architecture topologies unseen during training, while baseline approaches fail to generalize.
Executive Summary
This article introduces Neural Network Diffusion Transformers (NNiT), a novel approach to generative modeling of neural network parameters. NNIiT tokenizes weight matrices into patches and models them as locally structured fields, allowing for width-agnostic generation of weights. This approach is demonstrated to be effective on ManiSkill3 robotics tasks, achieving >85% success on unseen architecture topologies. The method jointly models discrete architecture tokens and continuous weight patches within a single sequence model, leveraging Graph HyperNetworks with a convolutional neural network (CNN) decoder to structurally align the weight space. NNIiT has significant implications for the field of neural network generation, enabling the creation of fully functional networks across a range of architectures.
Key Points
- ▸ Introduces Neural Network Diffusion Transformers (NNiT) for width-agnostic neural network generation
- ▸ Tokenizes weight matrices into patches and models them as locally structured fields
- ▸ Effectively generates weights across a range of architectures, including unseen topologies
Merits
Structural alignment of weight space
NNiT's use of Graph HyperNetworks with a CNN decoder enables the creation of locally correlated weight patches, facilitating patch-based processing.
Width-agnostic generation
NNiT's approach allows for the generation of weights in a width-agnostic manner, independent of known weight-matrix dimensions.
Effective generalization to unseen topologies
NNiT achieves >85% success on ManiSkill3 robotics tasks with unseen architecture topologies, demonstrating its ability to generalize effectively.
Demerits
Complexity of the model
NNiT's architecture is complex, consisting of Graph HyperNetworks and a CNN decoder, which may be challenging to implement and optimize.
Limited evaluation on diverse tasks
The article primarily focuses on ManiSkill3 robotics tasks and may not have been extensively evaluated on diverse tasks or datasets.
Expert Commentary
While NNIiT is a groundbreaking approach to neural network generation, its practical implementation and optimization may be challenging. Moreover, the article primarily focuses on ManiSkill3 robotics tasks, and it is unclear whether NNIiT generalizes effectively to diverse tasks and datasets. Nevertheless, the method's potential to create fully functional networks across a range of architectures makes it an exciting development in the field. As the field of neural network generation continues to evolve, it will be essential to explore the limitations and applications of NNIiT in more depth.
Recommendations
- ✓ Further research is needed to optimize and implement NNIiT effectively, particularly in terms of its complexity and computational requirements.
- ✓ NNiT should be extensively evaluated on diverse tasks and datasets to assess its generalizability and robustness.