Academic

Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

arXiv:2602.19109v1 Announce Type: new Abstract: We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover.

Y
Yao Yan
· · 1 min read · 2 views

arXiv:2602.19109v1 Announce Type: new Abstract: We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover.

Executive Summary

This article presents an in-depth analysis of post-routing arithmetic in Meta-Llama-3-8B, a large language model. The study focuses on three-digit addition under a one-token readout, characterizing how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. The findings suggest a sharp boundary at layer 17, beyond which the decoded sum is controlled by the last input token and late-layer self-attention is largely dispensable. The study also explores the relationship between digit direction dictionaries and their geometric structure, revealing an approximately orthogonal map inside a shared low-rank subspace. The results have significant implications for understanding the behavior of large language models and their potential applications in arithmetic tasks.

Key Points

  • The study identifies a sharp boundary at layer 17, beyond which the decoded sum is controlled by the last input token.
  • Late-layer self-attention is largely dispensable in the post-routing regime.
  • Digit direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace.

Merits

Strength in Methodology

The study employs a rigorous and systematic approach to analyze the behavior of Meta-Llama-3-8B, utilizing causal residual patching and cumulative attention ablations to localize the sharp boundary at layer 17.

Insight into Large Language Models

The study provides valuable insights into the behavior of large language models, shedding light on the relationship between digit direction dictionaries and their geometric structure.

Demerits

Limitation in Generalizability

The study is limited in its generalizability to other large language models and arithmetic tasks, as the findings are specific to Meta-Llama-3-8B and three-digit addition.

Technical Complexity

The study assumes a high level of technical expertise in deep learning and neural networks, potentially limiting its accessibility to a broader audience.

Expert Commentary

The study presents a rigorous and systematic analysis of post-routing arithmetic in Meta-Llama-3-8B, shedding light on the behavior of large language models in this regime. The findings have significant implications for understanding the geometric structure of digit direction dictionaries and their relationship to the behavior of these models. While the study is limited in its generalizability, its technical complexity and attention mechanisms design have far-reaching implications for the field of deep learning and neural networks. As such, this study is a valuable contribution to the ongoing discussion on the behavior of large language models and their potential applications in arithmetic tasks.

Recommendations

  • Future studies should aim to replicate the findings of this study in other large language models and arithmetic tasks.
  • Investigate the generalizability of the study's findings to other domains and applications of artificial intelligence.

Sources