Academic

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

arXiv:2604.03877v1 Announce Type: new Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.

Hope McGovern, Caroline Craig, Thomas Lippincott, Hale Sirin · April 7, 2026 · 1 min read · 4 views

#cs.CL #cs.AI #cs.LG

Executive Summary

This study delves into the limitations of analogical reasoning in Large Language Models (LLMs) by comparing probed representations with prompted performance in detecting narrative analogies. The authors found an asymmetry in the relationship between internal representations and prompted behavior, with probing outperforming prompting for rhetorical analogies but achieving similar low performance for narrative analogies. This suggests that the effectiveness of LLMs in analogical reasoning is task-dependent and may be hindered by limitations in how prompting accesses available information. The findings have significant implications for the development of more robust and generalizable LLMs, particularly in applications requiring narrative understanding and abstraction.

Key Points

▸ LLMs perform well in analogical reasoning when surface and structural cues align, but struggle when analogies require latent information
▸ Probing representations outperforms prompting in detecting rhetorical analogies, but achieves similar low performance for narrative analogies
▸ The relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information

Merits

Insight into LLM limitations

The study provides valuable insights into the limitations of LLMs in analogical reasoning, highlighting the need for more robust and generalizable models.

Task-dependent analysis

The study demonstrates the importance of task-dependent analysis in understanding the behavior of LLMs, particularly in applications requiring narrative understanding and abstraction.

Demerits

Limited scope

The study focuses on open-source models and may not be generalizable to proprietary models or real-world applications.

Methodological limitations

The study relies on probing representations and prompted performance, which may not capture the full range of LLM behavior in analogical reasoning.

Expert Commentary

The study's findings are significant not only because they highlight the limitations of LLMs in analogical reasoning but also because they demonstrate the importance of task-dependent analysis in understanding the behavior of LLMs. The study's results have implications for the development of more robust and generalizable LLMs, particularly in applications requiring narrative understanding and abstraction. Furthermore, the study highlights the need for more transparent and interpretable models, which may inform policy decisions on AI accountability and responsibility.

Recommendations

✓ Future studies should investigate the effectiveness of probing representations in detecting analogies in different types of LLMs, including proprietary models and real-world applications.
✓ The development of more robust and generalizable LLMs requires a better understanding of their limitations in analogical reasoning, which may inform the design of more effective training protocols and evaluation metrics.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

AI Commentary

Executive Summary

Key Points

Merits

Insight into LLM limitations

Task-dependent analysis

Demerits

Limited scope

Methodological limitations

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs