SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
arXiv:2603.06222v1 Announce Type: new Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely truncate what the model says rather than internalize what the model thinks. Latent reasoning offers a promising alternative by performing computation in the hidden space, yet prior methods face two critical challenges. Many existing approaches rely on rigid point-to-point alignment, forcing a latent token to approximate the final representation of a reasoning step, which can be insufficient to capture the dense, variable-length semantics of an entire reasoning segment. Furthermore, these methods often suffer from a lack of interpretability: latent states are commonly produced by unconstrained optimization or embedding mixing, yielding vectors that are difficult to decode or audit under t
arXiv:2603.06222v1 Announce Type: new Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely truncate what the model says rather than internalize what the model thinks. Latent reasoning offers a promising alternative by performing computation in the hidden space, yet prior methods face two critical challenges. Many existing approaches rely on rigid point-to-point alignment, forcing a latent token to approximate the final representation of a reasoning step, which can be insufficient to capture the dense, variable-length semantics of an entire reasoning segment. Furthermore, these methods often suffer from a lack of interpretability: latent states are commonly produced by unconstrained optimization or embedding mixing, yielding vectors that are difficult to decode or audit under the pretrained language head. We propose SPOT, a flexible framework that compresses explicit CoT into compact latent pause tokens without enforcing a fixed response template. At the core of SPOT is Span-level Semantic Alignment, a Sinkhorn optimal-transport objective that softly matches each pause token to the semantics of an entire reasoning segment, overcoming the rigidity of step-end alignment. To further improve interpretability, SPOT introduces a Frozen-Head Decoding Constraint that keeps latent states directly decodable as token distributions under the frozen pretrained LM head, enabling readable keyword interpretations of latent thoughts. Experiments on reasoning benchmarks demonstrate that SPOT improves accuracy by 2.3 points on average while reducing generated tokens by 37.5% and provides faithful semantic interpretations of the latent reasoning process.
Executive Summary
The article introduces SPOT, a framework for efficient and interpretable latent reasoning in large language models. SPOT improves upon existing methods by using Span-level Semantic Alignment and a Frozen-Head Decoding Constraint, allowing for more flexible and interpretable latent reasoning. The framework demonstrates improved accuracy and reduced generated tokens on reasoning benchmarks, while providing faithful semantic interpretations of the latent reasoning process.
Key Points
- ▸ SPOT framework for efficient and interpretable latent reasoning
- ▸ Span-level Semantic Alignment for flexible latent reasoning
- ▸ Frozen-Head Decoding Constraint for improved interpretability
Merits
Improved Accuracy
SPOT improves accuracy by 2.3 points on average on reasoning benchmarks
Reduced Computational Overhead
SPOT reduces generated tokens by 37.5%, decreasing inference cost
Interpretability
SPOT provides faithful semantic interpretations of the latent reasoning process
Demerits
Complexity
SPOT's Span-level Semantic Alignment and Frozen-Head Decoding Constraint may add complexity to the model
Limited Evaluation
The article's evaluation is limited to specific reasoning benchmarks, and may not generalize to other tasks
Expert Commentary
The introduction of SPOT marks a significant step forward in the development of efficient and interpretable latent reasoning in large language models. By addressing the limitations of existing methods, SPOT demonstrates the potential for improved accuracy and reduced computational overhead, while providing valuable insights into the latent reasoning process. However, further research is needed to fully explore the capabilities and limitations of SPOT, and to evaluate its applicability to a broader range of tasks and applications.
Recommendations
- ✓ Further evaluation of SPOT on diverse tasks and benchmarks to assess its generalizability
- ✓ Investigation into the potential applications of SPOT in real-world scenarios, such as natural language processing and decision-making systems