Academic

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

arXiv:2603.12105v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information a

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Mart\'inez, Pedro Reviriego · March 13, 2026 · 1 min read · 8 views

#cs.CL

Executive Summary

This study explores the capabilities of Large Language Models (LLMs) in estimating psycholinguistic norms at the sentence level, specifically focusing on memorability and reading times. The authors investigate whether LLMs can produce reliable estimates of these norms through zero-shot or few-shot prompting, as opposed to requiring fine-tuning. The results indicate that while LLMs show promise in fine-tuning, their zero-shot and few-shot performance remains mixed. This underscores the importance of exercising caution when using LLMs as proxies for human cognitive measures. The study contributes to our understanding of the strengths and limitations of LLMs in estimating sentence-level features and has implications for the development of more effective language models in applications such as language teaching and therapy.

Key Points

▸ LLMs can estimate sentence-level psycholinguistic norms through fine-tuning, but not through zero-shot or few-shot prompting.
▸ LLMs show promise in estimating sentence memorability and reading times, outperforming interpretable baseline predictors.
▸ The study highlights the need for caution when relying on LLMs as proxies for human cognitive measures.

Merits

Strength of LLMs in fine-tuning

The study demonstrates the ability of LLMs to estimate sentence-level psycholinguistic norms through fine-tuning, providing valuable insights into their capabilities and potential applications.

Contribution to understanding of LLMs

The study contributes to our understanding of the strengths and limitations of LLMs in estimating sentence-level features, highlighting the importance of fine-tuning in achieving accurate results.

Demerits

Limitation of zero-shot and few-shot performance

The study highlights the limitations of relying on LLMs for zero-shot or few-shot prompting, underscoring the need for caution in using these models as proxies for human cognitive measures.

Need for further research

The study underscores the need for further research into the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms, particularly in the context of fine-tuning.

Expert Commentary

This study provides valuable insights into the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms. The findings highlight the importance of fine-tuning in achieving accurate results and underscore the need for caution when relying on LLMs as proxies for human cognitive measures. The study's contributions to our understanding of LLMs and their potential applications in language teaching and therapy make it a significant contribution to the field. However, further research is needed to fully explore the capabilities and limitations of LLMs in these contexts.

Recommendations

✓ Future studies should investigate the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms in various languages and cultural contexts.
✓ Developers and users of LLMs should exercise caution when relying on these models as proxies for human cognitive measures, particularly in applications such as language teaching and therapy.

Sources

arXiv - cs.CL

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

AI Commentary

Executive Summary

Key Points

Merits

Strength of LLMs in fine-tuning

Contribution to understanding of LLMs

Demerits

Limitation of zero-shot and few-shot performance

Need for further research

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs