Logit Distance Bounds Representational Similarity
arXiv:2602.15438v1 Announce Type: new Abstract: For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the res
arXiv:2602.15438v1 Announce Type: new Abstract: For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher's predictions while failing to preserve linear representational properties, such as linear-probe recoverability of human-interpretable concepts. In distillation experiments on synthetic and image datasets, logit-distance distillation yields students with higher linear representational similarity and better preservation of the teacher's linearly recoverable concepts.
Executive Summary
This article sheds light on the relationship between distributional similarity and linear representational similarity in discriminative models. By introducing a novel logit distance-based measure, the authors demonstrate that closeness in this distance yields linear similarity guarantees. This finding has significant implications for distillation, as logit-distance distillation preserves linear representational properties better than KL-based distillation. Experiments on synthetic and image datasets validate the effectiveness of logit-distance distillation in capturing the teacher's linearly recoverable concepts. The research contributes to a deeper understanding of the representational properties of discriminative models, enabling more informed design and training strategies.
Key Points
- ▸ The article introduces a logit distance-based measure that captures linear similarity guarantees
- ▸ KL divergence upper-bounds logit distance, but the resulting bound is not practical
- ▸ Logit-distance distillation outperforms KL-based distillation in preserving linear representational properties
Merits
Strength in theoretical contribution
The article provides a novel theoretical framework for understanding the relationship between distributional similarity and linear representational similarity, enriching the field's knowledge of discriminative models.
Demerits
Limitation in experimental design
The article relies on synthetic and image datasets, which may not fully capture the complexities of real-world applications, limiting the generalizability of the findings.
Expert Commentary
The article's contribution to the theoretical understanding of discriminative models is substantial, but the experimental design could be strengthened by incorporating more diverse and complex datasets. The implications for practical applications are significant, as logit-distance distillation offers a promising approach to preserving linear representational properties. However, the policy implications remain speculative, and further research is needed to fully explore the potential impact on the development of discriminative models.
Recommendations
- ✓ Future research should focus on extending the theoretical framework to other types of models and exploring the generalizability of the findings across different domains.
- ✓ The development of more robust and efficient logit-distance distillation methods is crucial for practical applications, particularly in real-world scenarios where computational resources are limited.