Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference
arXiv:2602.22868v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding. This stems from the ''combinatorial contradiction'' phenomenon, where parallel tokens form semantically inconsistent combinations. We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency. We propose ReMix (Rejection Mixing), a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state. This intermediate state allows a token's representation to be iteratively refined in a continuous space, resolving mutual conflicts with other tokens before collapsing into a final discrete sample. Furthermore, a rejection rule reverts uncertain representations from the continuous state back to the masked state for reprocessing, ensuring stabi
arXiv:2602.22868v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding. This stems from the ''combinatorial contradiction'' phenomenon, where parallel tokens form semantically inconsistent combinations. We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency. We propose ReMix (Rejection Mixing), a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state. This intermediate state allows a token's representation to be iteratively refined in a continuous space, resolving mutual conflicts with other tokens before collapsing into a final discrete sample. Furthermore, a rejection rule reverts uncertain representations from the continuous state back to the masked state for reprocessing, ensuring stability and preventing error propagation. ReMix thus mitigates combinatorial contradictions by enabling continuous-space refinement during discrete diffusion decoding. Extensive experiments demonstrate that ReMix, as a training-free method, achieves a $2-8 \times$ inference speedup without any quality degradation.
Executive Summary
This article proposes ReMix, a novel framework for efficient diffusion large language model (DLLM) inference. By introducing a continuous mixing state as an intermediate between the initial masked state and the final decoded token state, ReMix enables continuous-space refinement during discrete diffusion decoding, mitigating combinatorial contradictions and achieving a $2-8 imes$ inference speedup without quality degradation. This work addresses a critical challenge in DLLMs, offering a scalable and high-quality solution for fast non-autoregressive inference. The method is training-free, making it accessible and efficient for real-world applications.
Key Points
- ▸ ReMix framework integrates continuous representations into discrete decoding process
- ▸ Continuous mixing state enables refinement of token representations in a continuous space
- ▸ Rejection rule reverts uncertain representations to masked state for reprocessing
Merits
Strength
Training-free method, accessible for real-world applications
Efficiency
Achieves $2-8 imes$ inference speedup without quality degradation
Scalability
Mitigates combinatorial contradictions for fast non-autoregressive inference
Demerits
Limitation
Dependence on high-quality initial masked state for effective refinement
Complexity
Additional computational overhead introduced by continuous mixing state
Expert Commentary
This article presents a significant contribution to the field of DLLMs, addressing a long-standing challenge in non-autoregressive inference. The proposed ReMix framework offers a promising solution, leveraging continuous representations to refine token representations. While the method's efficiency and scalability are compelling, its dependence on high-quality initial masked states and additional computational complexity are notable limitations. Further research should focus on optimizing the rejection rule and exploring its applicability to other natural language processing tasks.
Recommendations
- ✓ Further investigation into the impact of initial masked state quality on refinement effectiveness
- ✓ Exploration of ReMix's applicability to other NLP tasks, such as text classification and sentiment analysis