Academic

On the scaling relationship between cloze probabilities and language model next-token prediction

arXiv:2602.17848v1 Announce Type: new Abstract: Recent work has shown that larger language models have better predictive power for eye movement and reading time data. While even the best models under-allocate probability mass to human responses, larger models assign higher-quality estimates of next tokens and their likelihood of production in cloze data because they are less sensitive to lexical co-occurrence statistics while being better aligned semantically to human cloze responses. The results provide support for the claim that the greater memorization capacity of larger models helps them guess more semantically appropriate words, but makes them less sensitive to low-level information that is relevant for word recognition.

Cassandra L. Jacobs, Morgan Grobol · February 24, 2026 · 1 min read · 5 views

#cs.CL

Executive Summary

This article explores the relationship between cloze probabilities and next-token prediction in language models, highlighting that larger models exhibit superior predictive power for eye movement and reading time data. While these models still under-allocate probability mass to human responses, they provide higher-quality estimates of next tokens and their likelihood of production in cloze data. The study suggests that larger models are less sensitive to lexical co-occurrence statistics but are better aligned semantically with human responses, supporting the claim that increased memorization capacity enhances semantic appropriateness but reduces sensitivity to low-level word recognition information.

Key Points

▸ Larger language models show better predictive power for eye movement and reading time data.
▸ These models assign higher-quality estimates of next tokens and their likelihood in cloze data.
▸ Larger models are less sensitive to lexical co-occurrence statistics but are better aligned semantically with human responses.
▸ Increased memorization capacity in larger models helps guess more semantically appropriate words but reduces sensitivity to low-level word recognition information.

Merits

Empirical Evidence

The article provides empirical evidence supporting the claim that larger language models have better predictive power for human reading behaviors, which is a significant contribution to the field.

Theoretical Insight

The study offers theoretical insights into how larger models handle lexical co-occurrence statistics and semantic alignment, which is crucial for understanding model performance.

Demerits

Limited Scope

The study focuses primarily on cloze probabilities and next-token prediction, which may not fully capture the complexity of human language processing.

Under-allocation Issue

The models still under-allocate probability mass to human responses, indicating that there is room for improvement in aligning model predictions with human behavior.

Expert Commentary

This article presents a rigorous analysis of the relationship between cloze probabilities and next-token prediction in language models, offering valuable insights into the strengths and limitations of larger models. The empirical evidence supporting the claim that larger models have better predictive power for eye movement and reading time data is particularly noteworthy. However, the study's focus on cloze probabilities and next-token prediction may not fully capture the complexity of human language processing. The under-allocation of probability mass to human responses indicates that there is still room for improvement in aligning model predictions with human behavior. The theoretical insights into how larger models handle lexical co-occurrence statistics and semantic alignment are crucial for understanding model performance and can inform the development of more effective human-computer interaction systems. Overall, this study contributes significantly to the field of language modeling and provides a foundation for future research.

Recommendations

✓ Further research should explore the broader implications of these findings for language model training and deployment.
✓ Future studies should investigate the under-allocation issue and develop methods to better align model predictions with human responses.

Sources

arXiv - cs.CL

Something extraordinary is coming.

On the scaling relationship between cloze probabilities and language model next-token prediction

AI Commentary

Executive Summary

Key Points

Merits

Empirical Evidence

Theoretical Insight

Demerits

Limited Scope

Under-allocation Issue

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.