The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models
arXiv:2604.06374v1 Announce Type: new Abstract: Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of usin
arXiv:2604.06374v1 Announce Type: new Abstract: Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.
Executive Summary
This article rigorously investigates the presence and utility of 'superposition' – the simultaneous maintenance of multiple candidate solutions within a single latent representation – in language models employing continuous chain-of-thought (Latent CoT) reasoning. The authors explore three distinct operational regimes: training-free, fine-tuned, and from-scratch models. Employing sophisticated analytical tools like Logit Lens and entity-level probing, the study finds that only models trained entirely from scratch effectively leverage superposition. In contrast, pre-trained or fine-tuned models tend to exhibit a collapse of superposition, often resorting to shortcut solutions. The research attributes this divergence to pretraining biases towards token commitment and the significant influence of model capacity, offering a comprehensive explanation for the conditions under which superposition emerges or fails.
Key Points
- ▸ Superposition in Latent CoT reasoning is primarily observed in models trained from scratch, not in training-free or fine-tuned regimes.
- ▸ Pretraining on natural language data biases models towards early commitment to a single token, hindering superposition.
- ▸ Model capacity significantly influences whether superposition is utilized or if shortcut solutions are favored.
- ▸ The study uses Logit Lens and entity-level probing for internal representation analysis, providing empirical evidence for its claims.
- ▸ The findings offer a unified explanation for the emergence and collapse of superposition in continuous chain-of-thought reasoning.
Merits
Rigorous Empirical Methodology
The use of three distinct operational regimes (training-free, fine-tuned, from-scratch) provides a comprehensive experimental framework. The application of Logit Lens and entity-level probing offers robust, granular insights into internal model representations, moving beyond speculative claims to empirical validation.
Novel Conceptual Clarity
The article effectively deconstructs the 'illusion' of superposition, providing a nuanced understanding of its conditions for emergence versus collapse. This clarifies a previously ambiguous area in latent reasoning, attributing the failure to utilize superposition to specific architectural and training phenomena rather than inherent impossibility.
Unified Explanatory Framework
The identification of two complementary phenomena – pretraining bias and capacity effects – offers a cohesive and compelling explanation for the observed differences across regimes. This moves beyond isolated observations to a more generalizable theoretical understanding of latent reasoning dynamics.
Demerits
Limited Generalizability of 'From-Scratch' Findings
While 'from-scratch' models demonstrate superposition, their practical applicability in real-world, large-scale LLM development, where pretraining is foundational, remains a significant hurdle. The conditions under which 'from-scratch' training is feasible for complex tasks are not fully explored.
Scope of 'Shortcut Solutions' Analysis
The article identifies shortcut solutions as an alternative to superposition but does not deeply elaborate on their nature, efficiency, or potential limitations. A more detailed characterization of these shortcuts could further illuminate the trade-offs involved.
Potential for Task-Specificity in Superposition
The analysis might be task-dependent. It remains unclear if the observed patterns of superposition or collapse would hold across a wider range of reasoning tasks, especially those requiring different forms of combinatorial or multi-step inference. The chosen tasks' complexity or specific properties could influence the findings.
Expert Commentary
This article represents a significant contribution to our understanding of the internal mechanics of large language models, particularly concerning their capacity for sophisticated reasoning. The rigorous empirical approach, dissecting superposition across distinct training regimes, offers invaluable clarity where much has been speculative. The identification of pretraining biases and capacity effects as primary drivers for the presence or absence of superposition provides a compelling and unified explanatory framework. From a legal and academic perspective, this work underscores the profound impact of training methodologies on emergent AI capabilities. The 'illusion' of superposition in pre-trained models, yielding to shortcut solutions, has critical implications for the reliability and trustworthiness of AI in complex decision-making contexts. This research is not merely an academic exercise; it directly informs how we should design, evaluate, and ultimately regulate AI systems, particularly in domains requiring transparent, multi-faceted reasoning rather than mere surface-level coherence. It compels us to move beyond output-centric assessments towards a deeper scrutiny of internal cognitive architectures.
Recommendations
- ✓ Future research should explore methods to 'induce' superposition in fine-tuned or pre-trained models, perhaps through novel architectural modifications, regularization techniques, or specialized training objectives that counteract the identified pretraining biases.
- ✓ A more detailed investigation into the nature and limitations of the 'shortcut solutions' discovered by pre-trained models is warranted, including their robustness, generalizability, and potential for catastrophic failure in novel scenarios.
- ✓ Expand the scope of tasks to assess if the observed patterns of superposition are universal or task-specific, particularly for tasks requiring different forms of logical inference, planning, or creative problem-solving.
- ✓ Investigate the computational overhead and scalability challenges associated with 'from-scratch' training for complex tasks, and propose strategies to make this regime more practical for large-scale AI development.
Sources
Original: arXiv - cs.CL