Academic

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

Fangyu Ding, Ding Ding, Sijin Chen, Kaibo Wang, Peng Xu, Zijin Feng, Haoli Bai, Kai Han, Youliang Yan, Binhang Yuan, Jiacheng Sun · March 26, 2026 · 1 min read · 51 views

#cs.CL #cs.AI #cs.LG

arXiv:2603.23507v1 Announce Type: new Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs. DID improves training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: the computations on non-informative 1) tokens inherent to the paradigm, and 2) tokens introduced in variable-length settings. Furthermore, DID offers greater flexibility by: 1) natively supporting variable-length sequences without requiring fixed-length padding, and 2) an intrinsic self-correction mechanism during generation due to insertion that dynamically adjusts token positions. To train DID, we design a score-based approach that assigns scores to token insertion operations and derive appropriate training objectives. The objectives involve subsequence counting problems, which we efficiently solve via a parallelized dynamic programming algorithm. Our experiments across fixed and variable-length settings demonstrate the advantage of DID over baselines of MDLMs and existing insertion-based LMs, in terms of modeling performance, sampling quality, and training/inference speed, without any hyperparameter tuning.

Executive Summary

This article presents Deletion-Insertion Diffusion language models (DID), a novel approach to language modeling that replaces the masking paradigm of Masked Diffusion Language Models (MDLMs) with discrete token deletion and insertion processes. DID improves computational efficiency, generation flexibility, and modeling performance. The proposed method eliminates computational overhead, natively supports variable-length sequences, and introduces a self-correction mechanism during generation. Experiments demonstrate DID's advantages over MDLMs and existing insertion-based language models. While DID shows promise, its scalability and applicability to real-world tasks require further investigation. The method's efficiency and flexibility make it an attractive alternative to traditional language models.

Key Points

▸ DID replaces masking paradigm with discrete token deletion and insertion processes.
▸ DID improves computational efficiency, generation flexibility, and modeling performance.
▸ DID eliminates two major sources of computational overhead in MDLMs.

Merits

Improved Efficiency

DID eliminates non-informative tokens and reduces computations, leading to faster training and inference.

Increased Flexibility

DID natively supports variable-length sequences, reducing the need for padding and improving generation quality.

Self-Correction Mechanism

DID's insertion process introduces a self-correction mechanism, dynamically adjusting token positions and improving generation quality.

Demerits

Scalability Limitations

The proposed method may not be scalable to large datasets or complex tasks, requiring further investigation.

Applicability to Real-World Tasks

While DID shows promise in controlled experiments, its applicability to real-world tasks and practical scenarios requires further evaluation.

Expert Commentary

The proposed method shows significant promise in addressing the limitations of existing language models. DID's efficiency, flexibility, and self-correction mechanism make it an attractive alternative to traditional language models. However, further investigation is required to fully understand the scalability and applicability of DID to real-world tasks. The development of more efficient and flexible language models like DID can have significant implications for policy and decision-making, particularly in areas such as language education and language preservation. As such, DID represents a significant advance in the field of natural language processing and has the potential to drive innovation and improvement in a wide range of applications.

Recommendations

✓ Further investigation is required to fully understand the scalability and applicability of DID to real-world tasks.
✓ The development of more efficient and flexible language models like DID should be prioritized, with a focus on applications in language education and language preservation.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

AI Commentary

Executive Summary

Key Points

Merits

Improved Efficiency

Increased Flexibility

Self-Correction Mechanism

Demerits

Scalability Limitations

Applicability to Real-World Tasks

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.