One-step Language Modeling via Continuous Denoising
arXiv:2602.16813v1 Announce Type: new Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the
arXiv:2602.16813v1 Announce Type: new Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality matching state-of-the-art discrete diffusion models. With FMLM, our approach outperforms recent few-step language models across the board, with one-step generation exceeding their 8-step quality. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale. Code is available at https://github.com/david3684/flm.
Executive Summary
This article presents a novel approach to language modeling, dubbed flow-based continuous denoising, which leverages flow-based models to outperform discrete diffusion models in both quality and speed. By introducing a time reparameterization and distilling the flow-based language model into a distilled flow map language model, the authors demonstrate state-of-the-art generation quality on the LM1B and OWT language datasets. The approach challenges the conventional wisdom that discrete diffusion processes are necessary for generative modeling over discrete modalities. The authors' contributions have significant implications for accelerated flow-based language modeling at scale, and their code is publicly available for further exploration.
Key Points
- ▸ Flow-based continuous denoising outperforms discrete diffusion in terms of quality and speed
- ▸ The approach leverages flow-based models and introduces a time reparameterization for improved training stability and generation quality
- ▸ The distilled flow map language model achieves state-of-the-art generation quality on the LM1B and OWT language datasets
Merits
Strength in Methodological Innovation
The article introduces a novel methodological approach to language modeling, leveraging flow-based continuous denoising to outperform discrete diffusion models.
Impact on Accelerated Language Modeling
The article's contributions have significant implications for accelerated flow-based language modeling at scale, a key area of research in the field.
Demerits
Limitation in Dataset Scope
The article's findings are limited to the LM1B and OWT language datasets, and it is unclear how the approach will perform on other datasets or real-world applications.
Potential Overreliance on Time Reparameterization
The time reparameterization introduced in the article may not be universally applicable or optimal, and further exploration is needed to confirm its effectiveness.
Expert Commentary
The article presents a novel and innovative approach to language modeling, leveraging flow-based continuous denoising to outperform discrete diffusion models. While the article's findings are limited to the LM1B and OWT language datasets, the implications for accelerated flow-based language modeling at scale are significant. The introduction of a time reparameterization is a key methodological innovation, and further exploration is needed to confirm its effectiveness. The article's contributions have significant implications for generative modeling over discrete modalities, challenging the conventional wisdom that discrete diffusion processes are necessary. Overall, the article presents a compelling case for the potential of flow-based continuous denoising in language modeling and has significant implications for the field.
Recommendations
- ✓ Further exploration of the article's findings on other datasets and real-world applications is necessary to confirm the approach's effectiveness and scalability.
- ✓ The introduction of the time reparameterization is a key methodological innovation, and further investigation is needed to understand its limitations and potential pitfalls.