Academic

Diffusion-MPC in Discrete Domains: Feasibility Constraints, Horizon Effects, and Critic Alignment: Case study with Tetris

arXiv:2603.02348v1 Announce Type: new Abstract: We study diffusion-based model predictive control (Diffusion-MPC) in discrete combinatorial domains using Tetris as a case study. Our planner samples candidate placement sequences with a MaskGIT-style discrete denoiser and selects actions via reranking. We analyze three key factors: (1) feasibility-constrained sampling via logit masking over valid placements, (2) reranking strategies using a heuristic score, a pretrained DQN critic, and a hybrid combination, and (3) compute scaling in candidate count and planning horizon. We find that feasibility masking is necessary in discrete domains, removing invalid action mass (46%) and yielding a 6.8% improvement in score and 5.6% improvement in survival over unconstrained sampling. Naive DQN reranking is systematically misaligned with rollout quality, producing high decision regret (mean 17.6, p90 36.6). Shorter planning horizons outperform longer ones under sparse and delayed rewards, suggesting

H
Haochuan Kevin Wang
· · 1 min read · 10 views

arXiv:2603.02348v1 Announce Type: new Abstract: We study diffusion-based model predictive control (Diffusion-MPC) in discrete combinatorial domains using Tetris as a case study. Our planner samples candidate placement sequences with a MaskGIT-style discrete denoiser and selects actions via reranking. We analyze three key factors: (1) feasibility-constrained sampling via logit masking over valid placements, (2) reranking strategies using a heuristic score, a pretrained DQN critic, and a hybrid combination, and (3) compute scaling in candidate count and planning horizon. We find that feasibility masking is necessary in discrete domains, removing invalid action mass (46%) and yielding a 6.8% improvement in score and 5.6% improvement in survival over unconstrained sampling. Naive DQN reranking is systematically misaligned with rollout quality, producing high decision regret (mean 17.6, p90 36.6). Shorter planning horizons outperform longer ones under sparse and delayed rewards, suggesting uncertainty compounding in long imagined rollouts. Overall, compute choices (K, H) determine dominant failure modes: small K limits candidate quality, while larger H amplifies misranking and model mismatch. Our findings highlight structural challenges of diffusion planners in discrete environments and provide practical diagnostics for critic integration.

Executive Summary

This study investigates the applicability of diffusion-based model predictive control (Diffusion-MPC) in discrete combinatorial domains using Tetris as a case study. The authors analyze the impact of feasibility constraints, reranking strategies, and planning horizons on the performance of Diffusion-MPC. They find that feasibility masking is essential in discrete domains, and naive DQN reranking can lead to misaligned decisions. The study highlights the challenges of diffusion planners in discrete environments and provides practical diagnostics for critic integration.

Key Points

  • Feasibility masking is necessary in discrete domains to remove invalid action mass and improve performance.
  • Naive DQN reranking can lead to misaligned decisions and high decision regret.
  • Shorter planning horizons outperform longer ones under sparse and delayed rewards.

Merits

Strength in analysis

The study provides a comprehensive analysis of the challenges of diffusion planners in discrete environments, including feasibility constraints, reranking strategies, and planning horizons.

Practical implications

The study provides practical diagnostics for critic integration and highlights the importance of feasibility masking in discrete domains.

Demerits

Limitation in scope

The study is limited to a single case study (Tetris) and may not generalize to other discrete combinatorial domains.

Expert Commentary

This study provides a valuable contribution to the field of model predictive control in discrete domains. The authors' analysis of feasibility constraints, reranking strategies, and planning horizons provides a comprehensive understanding of the challenges of diffusion planners in discrete environments. The study's findings have important implications for the development of control strategies in discrete domains and highlight the need for careful consideration of feasibility constraints and reranking strategies. The use of Tetris as a case study is a good choice, as it provides a well-understood and complex discrete combinatorial domain. However, the study's limited scope to a single case study may limit its generalizability to other discrete combinatorial domains.

Recommendations

  • Future studies should investigate the application of Diffusion-MPC in other discrete combinatorial domains to generalize the findings of this study.
  • Researchers should consider the use of feasibility masking and reranking strategies when developing control strategies for discrete domains.

Sources