To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times
arXiv:2603.12105v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information a
arXiv:2603.12105v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.
Executive Summary
This study explores the capabilities of Large Language Models (LLMs) in estimating psycholinguistic norms at the sentence level, specifically focusing on memorability and reading times. The authors investigate whether LLMs can produce reliable estimates of these norms through zero-shot or few-shot prompting, as opposed to requiring fine-tuning. The results indicate that while LLMs show promise in fine-tuning, their zero-shot and few-shot performance remains mixed. This underscores the importance of exercising caution when using LLMs as proxies for human cognitive measures. The study contributes to our understanding of the strengths and limitations of LLMs in estimating sentence-level features and has implications for the development of more effective language models in applications such as language teaching and therapy.
Key Points
- ▸ LLMs can estimate sentence-level psycholinguistic norms through fine-tuning, but not through zero-shot or few-shot prompting.
- ▸ LLMs show promise in estimating sentence memorability and reading times, outperforming interpretable baseline predictors.
- ▸ The study highlights the need for caution when relying on LLMs as proxies for human cognitive measures.
Merits
Strength of LLMs in fine-tuning
The study demonstrates the ability of LLMs to estimate sentence-level psycholinguistic norms through fine-tuning, providing valuable insights into their capabilities and potential applications.
Contribution to understanding of LLMs
The study contributes to our understanding of the strengths and limitations of LLMs in estimating sentence-level features, highlighting the importance of fine-tuning in achieving accurate results.
Demerits
Limitation of zero-shot and few-shot performance
The study highlights the limitations of relying on LLMs for zero-shot or few-shot prompting, underscoring the need for caution in using these models as proxies for human cognitive measures.
Need for further research
The study underscores the need for further research into the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms, particularly in the context of fine-tuning.
Expert Commentary
This study provides valuable insights into the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms. The findings highlight the importance of fine-tuning in achieving accurate results and underscore the need for caution when relying on LLMs as proxies for human cognitive measures. The study's contributions to our understanding of LLMs and their potential applications in language teaching and therapy make it a significant contribution to the field. However, further research is needed to fully explore the capabilities and limitations of LLMs in these contexts.
Recommendations
- ✓ Future studies should investigate the capabilities and limitations of LLMs in estimating sentence-level psycholinguistic norms in various languages and cultural contexts.
- ✓ Developers and users of LLMs should exercise caution when relying on these models as proxies for human cognitive measures, particularly in applications such as language teaching and therapy.