Causality $\neq$ Invariance: Function and Concept Vectors in LLMs
arXiv:2602.22424v1 Announce Type: new Abstract: Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying mechanisms. Steering experiments reveal that FVs excel in-dist
arXiv:2602.22424v1 Announce Type: new Abstract: Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying mechanisms. Steering experiments reveal that FVs excel in-distribution, when extraction and application formats match (e.g., both open-ended in English), while CVs generalize better out-of-distribution across both question types (open-ended vs. multiple-choice) and languages. Our results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance.
Executive Summary
The article 'Causality ≠ Invariance: Function and Concept Vectors in LLMs' explores whether large language models (LLMs) represent concepts abstractly, independent of input format. The study revisits Function Vectors (FVs), which are compact representations driving in-context learning (ICL) task performance. The authors demonstrate that FVs are not fully invariant across different input formats, even when targeting the same concept. They introduce Concept Vectors (CVs), which are more stable concept representations derived from attention head outputs selected using Representational Similarity Analysis (RSA). The study finds that FVs excel in-distribution but CVs generalize better out-of-distribution across question types and languages, indicating that LLMs contain abstract concept representations distinct from those driving ICL performance.
Key Points
- ▸ Function Vectors (FVs) are not fully invariant across different input formats.
- ▸ Concept Vectors (CVs) are more stable concept representations derived from attention head outputs.
- ▸ FVs excel in-distribution, while CVs generalize better out-of-distribution across question types and languages.
Merits
Innovative Approach
The article introduces a novel method using Representational Similarity Analysis (RSA) to identify Concept Vectors (CVs), providing a new perspective on how LLMs represent concepts.
Empirical Rigor
The study conducts rigorous empirical analysis across multiple LLMs, demonstrating the distinct roles of FVs and CVs in task performance and generalization.
Demerits
Limited Scope
The findings are based on a specific set of LLMs and input formats, which may not be generalizable to all models and contexts.
Complexity
The methodology and analysis are complex, which may limit the accessibility of the findings to a broader audience.
Expert Commentary
The article 'Causality ≠ Invariance: Function and Concept Vectors in LLMs' presents a significant advancement in the understanding of concept representation in large language models. By distinguishing between Function Vectors (FVs) and Concept Vectors (CVs), the authors provide a nuanced view of how LLMs process and generalize information. The introduction of Representational Similarity Analysis (RSA) as a tool to identify CVs is particularly noteworthy, as it offers a method to uncover more stable and abstract concept representations. The empirical findings, demonstrating the superior generalization of CVs out-of-distribution, have important implications for both the development of AI models and their practical applications. However, the study's limitations, such as its focus on a specific set of models and formats, highlight the need for further research to validate and expand these findings. Overall, this work contributes valuable insights to the field of AI and machine learning, paving the way for more sophisticated and generalizable models.
Recommendations
- ✓ Future research should explore the generalizability of these findings across a broader range of LLMs and input formats.
- ✓ Developers should consider incorporating the insights from this study into the design of AI models to enhance their robustness and generalization capabilities.