Heterogeneous Decentralized Diffusion Models
arXiv:2603.06741v1 Announce Type: new Abstract: Training frontier-scale diffusion models often requires substantial computational resources concentrated in tightly coupled clusters, limiting participation to well-resourced institutions. While Decentralized Diffusion Models (DDM) enable training multiple experts in isolation, existing approaches require 1176 GPU-days and homogeneous training objectives across all experts. We present an efficient framework that reduces resource requirements while supporting heterogeneous training objectives. Our approach combines three contributions: (1) a heterogeneous decentralized training paradigm that allows experts to use different objectives (DDPM and Flow Matching), unified at inference time via a deterministic schedule-aware conversion into a common velocity space without retraining; (2) pretrained checkpoint conversion from ImageNet-DDPM to Flow Matching objectives, accelerating convergence and enabling initialization without objective-specifi
arXiv:2603.06741v1 Announce Type: new Abstract: Training frontier-scale diffusion models often requires substantial computational resources concentrated in tightly coupled clusters, limiting participation to well-resourced institutions. While Decentralized Diffusion Models (DDM) enable training multiple experts in isolation, existing approaches require 1176 GPU-days and homogeneous training objectives across all experts. We present an efficient framework that reduces resource requirements while supporting heterogeneous training objectives. Our approach combines three contributions: (1) a heterogeneous decentralized training paradigm that allows experts to use different objectives (DDPM and Flow Matching), unified at inference time via a deterministic schedule-aware conversion into a common velocity space without retraining; (2) pretrained checkpoint conversion from ImageNet-DDPM to Flow Matching objectives, accelerating convergence and enabling initialization without objective-specific pretraining; and (3) PixArt-alpha's efficient AdaLN-Single architecture, reducing parameters while maintaining quality. Experiments on LAION-Aesthetics show that, relative to the training scale reported for prior DDM work, our approach reduces compute from 1176 to 72 GPU-days (16x) and data from 158M to 11M (14x). Under aligned inference settings, our heterogeneous 2DDPM:6FM configuration achieves better FID (11.88 vs. 12.45) and higher intra-prompt diversity (LPIPS 0.631 vs. 0.617) than the homogeneous 8FM baseline. By eliminating synchronization requirements and enabling mixed DDPM/FM objectives, our framework lowers infrastructure requirements for decentralized generative model training.
Executive Summary
This article proposes a novel framework for decentralized generative model training, addressing the limitations of existing approaches. The framework, dubbed Heterogeneous Decentralized Diffusion Models (HDDM), enables training on heterogeneous objectives, reduces resource requirements, and accelerates convergence. The authors demonstrate significant improvements in compute efficiency, data requirements, and quality metrics. HDDM's architecture is based on a deterministic schedule-aware conversion of experts' objectives into a common velocity space, allowing for unified inference. The framework also incorporates a pretrained checkpoint conversion and an efficient AdaLN-Single architecture. The authors provide extensive experiments on the LAION-Aesthetics dataset, showcasing the efficacy of HDDM in achieving state-of-the-art results with substantial reductions in compute and data requirements.
Key Points
- ▸ Heterogeneous decentralized training paradigm with different objectives (DDPM and Flow Matching)
- ▸ Deterministic schedule-aware conversion for unified inference
- ▸ Pretrained checkpoint conversion for accelerated convergence and reduced pretraining requirements
Merits
Strength in Computational Efficiency
HDDM significantly reduces resource requirements, achieving a 16x decrease in GPU-days and a 14x reduction in data requirements compared to prior decentralized diffusion model work.
Improved Quality Metrics
Experiments on LAION-Aesthetics demonstrate HDDM's ability to achieve better FID and higher intra-prompt diversity compared to a homogeneous baseline.
Flexibility and Scalability
HDDM's heterogeneous training paradigm enables mixed DDPM/FM objectives, allowing for more flexible and scalable decentralized generative model training.
Demerits
Limited Generalizability
The authors' experiments are confined to the LAION-Aesthetics dataset, and it remains unclear whether HDDM's benefits generalize to other datasets and tasks.
Dependence on Pretrained Checkpoints
The use of pretrained checkpoints may introduce dependencies on pretraining datasets and objectives, potentially limiting HDDM's adoption in certain contexts.
Complexity and Overheads
HDDM's architecture and conversion process may introduce additional complexity and overheads, requiring careful consideration and optimization.
Expert Commentary
While HDDM represents a significant step forward in decentralized generative model training, its limitations and potential challenges must be carefully considered. The framework's dependence on pretrained checkpoints and potential complexity overheads require further investigation and optimization. Nevertheless, HDDM's efficacy in achieving state-of-the-art results with substantial reductions in compute and data requirements underscores its potential for widespread adoption in AI research and development.
Recommendations
- ✓ Further experimentation on diverse datasets and tasks to evaluate HDDM's generalizability and robustness.
- ✓ Investigation into the impact of pretrained checkpoints on HDDM's performance and scalability.