Distribution-Conditioned Transport
arXiv:2603.04736v1 Announce Type: new Abstract: Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen during training. We introduce distribution-conditioned transport (DCT), a framework that conditions transport maps on learned embeddings of source and target distributions, enabling generalization to unseen distribution pairs. DCT also allows semi-supervised learning for distributional forecasting problems: because it learns from arbitrary distribution pairs, it can leverage distributions observed at only one condition to improve transport prediction. DCT is agnostic to the underlying transport mechanism, supporting models ranging from flow matching to distributional divergence-based models (e.g. Wasserstein, MMD). We demonstrate the practical performance benefits of DCT on synthetic benchmarks a
arXiv:2603.04736v1 Announce Type: new Abstract: Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen during training. We introduce distribution-conditioned transport (DCT), a framework that conditions transport maps on learned embeddings of source and target distributions, enabling generalization to unseen distribution pairs. DCT also allows semi-supervised learning for distributional forecasting problems: because it learns from arbitrary distribution pairs, it can leverage distributions observed at only one condition to improve transport prediction. DCT is agnostic to the underlying transport mechanism, supporting models ranging from flow matching to distributional divergence-based models (e.g. Wasserstein, MMD). We demonstrate the practical performance benefits of DCT on synthetic benchmarks and four applications in biology: batch effect transfer in single-cell genomics, perturbation prediction from mass cytometry data, learning clonal transcriptional dynamics in hematopoiesis, and modeling T-cell receptor sequence evolution.
Executive Summary
The article introduces distribution-conditioned transport (DCT), a novel framework that enables the generalization of transport models to unseen source and target distributions. DCT conditions transport maps on learned embeddings of source and target distributions, allowing for semi-supervised learning and leveraging distributions observed at only one condition to improve transport prediction. The framework is agnostic to the underlying transport mechanism, supporting various models, including flow matching and distributional divergence-based models. The authors demonstrate the practical performance benefits of DCT on synthetic benchmarks and four applications in biology, showcasing its potential in batch effect transfer, perturbation prediction, clonal transcriptional dynamics, and T-cell receptor sequence evolution.
Key Points
- ▸ DCT conditions transport maps on learned embeddings of source and target distributions
- ▸ DCT enables generalization to unseen source and target distributions
- ▸ DCT supports various transport models, including flow matching and distributional divergence-based models
Merits
Strength in Generalization
DCT's ability to condition transport maps on learned embeddings enables generalization to unseen source and target distributions, making it a valuable framework for scientific applications where data is limited or complex.
Flexibility in Model Selection
DCT's agnosticism to the underlying transport mechanism allows for the use of various models, including flow matching and distributional divergence-based models, making it a versatile tool for different applications.
Demerits
Limited Interpretability
DCT's reliance on learned embeddings may limit its interpretability, making it challenging to understand the underlying relationships between source and target distributions.
Computational Complexity
DCT's requirement for learning embeddings may increase computational complexity, particularly for large datasets, which can impact its scalability and efficiency.
Expert Commentary
The article's introduction of DCT marks a significant advancement in transport modeling, enabling the generalization of models to unseen source and target distributions. While DCT's strengths lie in its ability to condition transport maps on learned embeddings and its flexibility in model selection, its limitations, including limited interpretability and computational complexity, must be carefully considered. The authors' demonstration of DCT's practical performance benefits in various applications in biology highlights its potential in real-world scenarios. However, further research is needed to fully explore DCT's capabilities and limitations, particularly in terms of its scalability and efficiency.
Recommendations
- ✓ Further research should focus on developing techniques to improve DCT's interpretability and reducing its computational complexity
- ✓ DCT should be explored in other scientific applications beyond biology to fully understand its potential and limitations