Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
arXiv:2604.02340v1 Announce Type: new Abstract: Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive …
Ivan Sedykh, Nikita Sorokin, Valentin Malykh
3 views