Academic

Grounded Chess Reasoning in Language Models via Master Distillation

arXiv:2603.20510v1 Announce Type: new Abstract: Language models often lack grounded reasoning capabilities in specialized domains where training data is scarce but bespoke systems excel. We introduce a general framework for distilling expert system reasoning into natural language chain-of-thought explanations, enabling compact models to acquire domain expertise and the ability to generate faithful, grounded explanations. Rather than distilling only final outputs, we capture the full reasoning process, transforming opaque expert computations into transparent, step-by-step explanations. We demonstrate this approach in chess, a canonical reasoning domain where language models continue to underperform. Our 4B parameter model, C1, advances from a near-zero baseline to 48.1% accuracy, outperforming all open-source models and most frontier proprietary systems. Notably, C1 surpasses its distillation teacher and generates solutions in two orders of magnitude fewer tokens than baselines. Unlike

arXiv:2603.20510v1 Announce Type: new Abstract: Language models often lack grounded reasoning capabilities in specialized domains where training data is scarce but bespoke systems excel. We introduce a general framework for distilling expert system reasoning into natural language chain-of-thought explanations, enabling compact models to acquire domain expertise and the ability to generate faithful, grounded explanations. Rather than distilling only final outputs, we capture the full reasoning process, transforming opaque expert computations into transparent, step-by-step explanations. We demonstrate this approach in chess, a canonical reasoning domain where language models continue to underperform. Our 4B parameter model, C1, advances from a near-zero baseline to 48.1% accuracy, outperforming all open-source models and most frontier proprietary systems. Notably, C1 surpasses its distillation teacher and generates solutions in two orders of magnitude fewer tokens than baselines. Unlike prior neural chess approaches that predict only best moves, C1 generates explainable solutions revealing strategic reasoning. Our pipeline combines supervised fine-tuning and reinforcement learning with theme-balanced data sampling for comprehensive tactical coverage. Master Distillation demonstrates how to inject expert-level knowledge into compact models for under-optimized domains, offering a recipe for unlocking RLVR where LLMs lack sufficient base capabilities.

Executive Summary

The article proposes a novel approach, Master Distillation, to infuse language models with grounded reasoning capabilities in specialized domains. By distilling expert system reasoning into natural language chain-of-thought explanations, the approach enables compact models to acquire domain expertise and generate faithful, grounded explanations. The authors demonstrate the effectiveness of this approach in the chess domain, achieving 48.1% accuracy and surpassing open-source and proprietary systems. The method combines supervised fine-tuning and reinforcement learning with theme-balanced data sampling for comprehensive tactical coverage. This breakthrough has significant implications for unlocking Reasoning, Learning, and Value Alignment (RLVA) in under-optimized domains.

Key Points

  • Master Distillation framework distills expert system reasoning into natural language explanations.
  • The approach enables compact models to acquire domain expertise and generate faithful explanations.
  • The authors demonstrate the effectiveness of Master Distillation in the chess domain, achieving state-of-the-art results.

Merits

Strength

The article presents a novel and effective approach to infusing language models with grounded reasoning capabilities, addressing a critical limitation in current LLMs.

Demerits

Limitation

The approach may require significant computational resources and expertise in expert system development, limiting its adoption in resource-constrained environments.

Expert Commentary

The article presents a groundbreaking approach to infusing language models with grounded reasoning capabilities. By distilling expert system reasoning into natural language explanations, the authors demonstrate a significant improvement in the performance of compact models in the chess domain. This breakthrough has far-reaching implications for the development of more effective and explainable AI systems. However, the approach may require significant computational resources and expertise in expert system development, limiting its adoption in resource-constrained environments. Nevertheless, the authors' contribution to the growing field of explainable AI is substantial, and their work will likely inspire further research in this area.

Recommendations

  • Future research should focus on adapting the Master Distillation approach to other specialized domains, such as medicine and finance, where grounded reasoning capabilities are critical.
  • The development of more efficient and scalable methods for distilling expert system reasoning would be essential for widespread adoption of the approach.

Sources

Original: arXiv - cs.AI