Academic

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

arXiv:2603.13259v1 Announce Type: new Abstract: When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations. We introduce forced-completion probing, a method that presents identical queries with known correct and incorrect single-token continuations and tracks five geometric measurements across every layer of four decoder-only models(1.5B-13B parameters). We report three findings. First, correct and incorrect paths diverge through rotation, not rescaling: displacement vectors maintain near-identical magnitudes while their angular separation increases, meaning factual selection is encoded in direction on an approximate hypersphere. Second, the mode

J
Javier Mar\'in
· · 1 min read · 12 views

arXiv:2603.13259v1 Announce Type: new Abstract: When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations. We introduce forced-completion probing, a method that presents identical queries with known correct and incorrect single-token continuations and tracks five geometric measurements across every layer of four decoder-only models(1.5B-13B parameters). We report three findings. First, correct and incorrect paths diverge through rotation, not rescaling: displacement vectors maintain near-identical magnitudes while their angular separation increases, meaning factual selection is encoded in direction on an approximate hypersphere. Second, the model does not passively fail on incorrect input-it actively suppresses the correct answer, driving internal probability away from the right token. Third, both phenomena are entirely absent below a parameter threshold and emerge at 1.6B, suggesting a phase transition in factual processing capability. These results show that factual constraint processing has a specific geometric character-rotational, not scalar; active, not passive-that is invisible to methods based on single-layer probes or magnitude comparisons.

Executive Summary

The article explores how language models process correct and incorrect information, introducing a method called forced-completion probing to track geometric measurements across layers of decoder-only models. The findings reveal that correct and incorrect paths diverge through rotation, not rescaling, and the model actively suppresses the correct answer when given incorrect input. These phenomena emerge at a parameter threshold of 1.6B, suggesting a phase transition in factual processing capability.

Key Points

  • Correct and incorrect paths diverge through rotation, not rescaling, in the language model's internal representations
  • The model actively suppresses the correct answer when given incorrect input, driving internal probability away from the right token
  • These phenomena emerge at a parameter threshold of 1.6B, indicating a phase transition in factual processing capability

Merits

Novel Methodology

The introduction of forced-completion probing provides a new and innovative approach to understanding language model internal dynamics

Demerits

Limited Model Scope

The study only examines decoder-only models, which may not be representative of all language model architectures

Expert Commentary

The article's discovery of rotational dynamics in factual constraint processing offers significant insights into the internal workings of language models. The emergence of these phenomena at a specific parameter threshold suggests a complex interplay between model capacity and factual processing capability. Further research is needed to fully understand the implications of these findings and to explore their applications in language model design and optimization.

Recommendations

  • Future studies should investigate the generalizability of these findings to other language model architectures and tasks
  • The development of more advanced methods for analyzing internal language model dynamics is necessary to further our understanding of factual constraint processing

Sources