Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
arXiv:2603.03305v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and renormalization, but it can distort generation when the model assigns low probability mass to valid continuations, pushing decoding toward locally valid yet semantically incorrect trajectories. We propose \emph{Draft-Conditioned Constrained Decoding (DCCD)}, a simple two-step, training-free inference procedure that decouples semantic planning from structural enforcement: an unconstrained draft is generated first, and constrained decoding is then applied, conditioned on this draft, to guarantee validity. We analyze DCCD through a KL-projection view, showing that draft conditioning increases feasible mass and reduces the cumulative "projection tax" induced by hard constraints, with an optional best-of-
arXiv:2603.03305v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and renormalization, but it can distort generation when the model assigns low probability mass to valid continuations, pushing decoding toward locally valid yet semantically incorrect trajectories. We propose \emph{Draft-Conditioned Constrained Decoding (DCCD)}, a simple two-step, training-free inference procedure that decouples semantic planning from structural enforcement: an unconstrained draft is generated first, and constrained decoding is then applied, conditioned on this draft, to guarantee validity. We analyze DCCD through a KL-projection view, showing that draft conditioning increases feasible mass and reduces the cumulative "projection tax" induced by hard constraints, with an optional best-of-$K$ draft selection. Across structured reasoning benchmarks, DCCD improves strict structured accuracy by up to +24 percentage points over standard constrained decoding (e.g., 15.2\% to 39.0\% on GSM8K with a 1B model), and enables smaller model pairs to match or exceed much larger constrained baselines, yielding substantial gains in parameter efficiency.
Executive Summary
The article proposes Draft-Conditioned Constrained Decoding (DCCD), a two-step inference procedure that decouples semantic planning from structural enforcement in Large Language Models (LLMs). By generating an unconstrained draft first and then applying constrained decoding conditioned on this draft, DCCD increases feasible mass and reduces the cumulative 'projection tax' induced by hard constraints. The method is shown to improve strict structured accuracy by up to +24 percentage points and enables smaller model pairs to match or exceed much larger constrained baselines, yielding substantial gains in parameter efficiency. The approach provides a promising solution for generating executable outputs, JSON objects, and API calls in LLMs, where a single syntax error can render the output unusable.
Key Points
- ▸ DCCD decouples semantic planning from structural enforcement in LLMs
- ▸ Draft conditioning increases feasible mass and reduces the cumulative 'projection tax'
- ▸ DCCD improves strict structured accuracy by up to +24 percentage points
- ▸ Smaller model pairs can match or exceed larger constrained baselines
- ▸ Parameter efficiency is substantially improved
Merits
Strength
DCCD's ability to decouple semantic planning from structural enforcement allows for more flexible and efficient generation of structured outputs.
Demerits
Limitation
The approach assumes that the unconstrained draft is a good representation of the desired output, which may not always be the case.
Expert Commentary
The article presents a well-motivated and innovative approach to constrained decoding in LLMs. The use of draft conditioning to increase feasible mass and reduce the cumulative 'projection tax' is a key insight that has the potential to significantly improve the generation of structured outputs. However, the assumption that the unconstrained draft is a good representation of the desired output may not always hold, and further research is needed to address this limitation. Additionally, the approach may be sensitive to the choice of draft selection method and the hyperparameters of the constrained decoding step. Nonetheless, DCCD is a promising direction for research in LLMs and has the potential to lead to significant advancements in natural language processing and generation.
Recommendations
- ✓ Future research should focus on addressing the limitation of the approach and exploring alternative draft selection methods and hyperparameter tuning strategies.
- ✓ The use of DCCD should be explored in other applications where structured generation is critical, such as in software development and data science.