When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
arXiv:2604.03877v1 Announce Type: new Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.
arXiv:2604.03877v1 Announce Type: new Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.
Executive Summary
This study delves into the limitations of analogical reasoning in Large Language Models (LLMs) by comparing probed representations with prompted performance in detecting narrative analogies. The authors found an asymmetry in the relationship between internal representations and prompted behavior, with probing outperforming prompting for rhetorical analogies but achieving similar low performance for narrative analogies. This suggests that the effectiveness of LLMs in analogical reasoning is task-dependent and may be hindered by limitations in how prompting accesses available information. The findings have significant implications for the development of more robust and generalizable LLMs, particularly in applications requiring narrative understanding and abstraction.
Key Points
- ▸ LLMs perform well in analogical reasoning when surface and structural cues align, but struggle when analogies require latent information
- ▸ Probing representations outperforms prompting in detecting rhetorical analogies, but achieves similar low performance for narrative analogies
- ▸ The relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information
Merits
Insight into LLM limitations
The study provides valuable insights into the limitations of LLMs in analogical reasoning, highlighting the need for more robust and generalizable models.
Task-dependent analysis
The study demonstrates the importance of task-dependent analysis in understanding the behavior of LLMs, particularly in applications requiring narrative understanding and abstraction.
Demerits
Limited scope
The study focuses on open-source models and may not be generalizable to proprietary models or real-world applications.
Methodological limitations
The study relies on probing representations and prompted performance, which may not capture the full range of LLM behavior in analogical reasoning.
Expert Commentary
The study's findings are significant not only because they highlight the limitations of LLMs in analogical reasoning but also because they demonstrate the importance of task-dependent analysis in understanding the behavior of LLMs. The study's results have implications for the development of more robust and generalizable LLMs, particularly in applications requiring narrative understanding and abstraction. Furthermore, the study highlights the need for more transparent and interpretable models, which may inform policy decisions on AI accountability and responsibility.
Recommendations
- ✓ Future studies should investigate the effectiveness of probing representations in detecting analogies in different types of LLMs, including proprietary models and real-world applications.
- ✓ The development of more robust and generalizable LLMs requires a better understanding of their limitations in analogical reasoning, which may inform the design of more effective training protocols and evaluation metrics.
Sources
Original: arXiv - cs.CL