Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching
arXiv:2603.12517v1 Announce Type: new Abstract: Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from $3.85$ (Uniform) to $3.22$ while reaching peak performance at $100$k rather than $150$k training steps. Our results
arXiv:2603.12517v1 Announce Type: new Abstract: Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from $3.85$ (Uniform) to $3.22$ while reaching peak performance at $100$k rather than $150$k training steps. Our results highlight that timestep sampling should be treated as an evolving curriculum rather than a fixed hyperparameter.
Executive Summary
This article proposes a novel approach to timestep sampling in Flow Matching models, known as Curriculum Sampling. By analyzing per-timestep training losses, the authors identify a U-shaped difficulty profile and attribute persistent errors near the boundary regimes to under-sampling the endpoints. As a result, they introduce a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. The proposed method demonstrates improved performance on CIFAR-10, reaching peak performance at 100k training steps compared to 150k steps using Uniform sampling. This study contributes to the understanding of timestep sampling as an evolving curriculum rather than a fixed hyperparameter. The findings have significant implications for the field of machine learning and the development of more efficient training protocols.
Key Points
- ▸ Curriculum Sampling is a two-phase schedule for timestep sampling in Flow Matching models.
- ▸ Middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling.
- ▸ Per-timestep training losses reveal a U-shaped difficulty profile with persistent errors near the boundary regimes.
Merits
Strength in Analytical Approach
The authors employ a rigorous analytical approach to understand the behavior of timestep sampling, providing valuable insights into the difficulty profile and its implications for model training.
Strength in Practical Application
The proposed Curriculum Sampling method demonstrates significant improvement in performance on CIFAR-10, offering a practical solution for efficient training of Flow Matching models.
Demerits
Limitation in Generalizability
The study focuses on Flow Matching models and CIFAR-10 dataset, limiting the generalizability of the findings to other machine learning architectures and datasets.
Limitation in Computational Complexity
The two-phase schedule may introduce additional computational complexity, potentially offsetting the benefits of improved performance.
Expert Commentary
The authors' analytical approach and proposed method demonstrate a nuanced understanding of timestep sampling in Flow Matching models. The findings have significant implications for the field of machine learning, particularly in the development of more efficient training protocols. However, the study's limitations in generalizability and computational complexity should be addressed in future work. The proposed Curriculum Sampling method offers a promising solution for efficient training of Flow Matching models, and its potential applications in other machine learning models warrant further investigation.
Recommendations
- ✓ Future studies should investigate the applicability of Curriculum Sampling to other machine learning models and datasets.
- ✓ Research should be conducted to optimize the computational complexity of the two-phase schedule and explore its potential applications in real-world scenarios.