Skip to main content
Academic

Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference

arXiv:2602.22868v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding. This stems from the ''combinatorial contradiction'' phenomenon, where parallel tokens form semantically inconsistent combinations. We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency. We propose ReMix (Rejection Mixing), a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state. This intermediate state allows a token's representation to be iteratively refined in a continuous space, resolving mutual conflicts with other tokens before collapsing into a final discrete sample. Furthermore, a rejection rule reverts uncertain representations from the continuous state back to the masked state for reprocessing, ensuring stabi

arXiv:2602.22868v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding. This stems from the ''combinatorial contradiction'' phenomenon, where parallel tokens form semantically inconsistent combinations. We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency. We propose ReMix (Rejection Mixing), a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state. This intermediate state allows a token's representation to be iteratively refined in a continuous space, resolving mutual conflicts with other tokens before collapsing into a final discrete sample. Furthermore, a rejection rule reverts uncertain representations from the continuous state back to the masked state for reprocessing, ensuring stability and preventing error propagation. ReMix thus mitigates combinatorial contradictions by enabling continuous-space refinement during discrete diffusion decoding. Extensive experiments demonstrate that ReMix, as a training-free method, achieves a $2-8 \times$ inference speedup without any quality degradation.

Executive Summary

This article proposes ReMix, a novel framework for efficient diffusion large language model (DLLM) inference. By introducing a continuous mixing state as an intermediate between the initial masked state and the final decoded token state, ReMix enables continuous-space refinement during discrete diffusion decoding, mitigating combinatorial contradictions and achieving a $2-8 imes$ inference speedup without quality degradation. This work addresses a critical challenge in DLLMs, offering a scalable and high-quality solution for fast non-autoregressive inference. The method is training-free, making it accessible and efficient for real-world applications.

Key Points

  • ReMix framework integrates continuous representations into discrete decoding process
  • Continuous mixing state enables refinement of token representations in a continuous space
  • Rejection rule reverts uncertain representations to masked state for reprocessing

Merits

Strength

Training-free method, accessible for real-world applications

Efficiency

Achieves $2-8 imes$ inference speedup without quality degradation

Scalability

Mitigates combinatorial contradictions for fast non-autoregressive inference

Demerits

Limitation

Dependence on high-quality initial masked state for effective refinement

Complexity

Additional computational overhead introduced by continuous mixing state

Expert Commentary

This article presents a significant contribution to the field of DLLMs, addressing a long-standing challenge in non-autoregressive inference. The proposed ReMix framework offers a promising solution, leveraging continuous representations to refine token representations. While the method's efficiency and scalability are compelling, its dependence on high-quality initial masked states and additional computational complexity are notable limitations. Further research should focus on optimizing the rejection rule and exploring its applicability to other natural language processing tasks.

Recommendations

  • Further investigation into the impact of initial masked state quality on refinement effectiveness
  • Exploration of ReMix's applicability to other NLP tasks, such as text classification and sentiment analysis

Sources