Skip to main content
Academic

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

arXiv:2602.20300v1 Announce Type: new Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings

W
William Watson, Nicole Cho, Sumitra Ganesh, Manuela Veloso
· · 1 min read · 0 views

arXiv:2602.20300v1 Announce Type: new Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings establish an empirically observable query-feature representation correlated with hallucination risk, paving the way for guided query rewriting and future intervention studies.

Executive Summary

This article examines the impact of human-confusing linguistic features on Large Language Model (LLM) performance, specifically hallucinations. The authors construct a 22-dimension query feature vector and analyze 369,837 real-world queries to identify features that increase hallucination risk. The study reveals a consistent risk landscape, with features like deep clause nesting and underspecification aligning with higher hallucination propensity, while clear intention grounding and answerability align with lower rates. The findings have implications for guided query rewriting and future intervention studies.

Key Points

  • LLM hallucinations can be influenced by query features
  • A 22-dimension query feature vector was constructed to analyze query complexity
  • Certain features, such as deep clause nesting, increase hallucination risk

Merits

Comprehensive Analysis

The study provides a thorough examination of the relationship between query features and LLM hallucinations, shedding light on the complexities of language model performance.

Demerits

Limited Generalizability

The study's findings may not be generalizable to all LLMs or datasets, as the results show mixed effects for certain features.

Expert Commentary

This study contributes significantly to our understanding of the complex interplay between query features and LLM performance. The identification of a consistent risk landscape has important implications for the development of more accurate and reliable language models. Furthermore, the study highlights the need for careful consideration of query design and rewriting to mitigate hallucination risk. As LLMs become increasingly ubiquitous, this research has the potential to inform the development of more effective and trustworthy language technologies.

Recommendations

  • Future studies should investigate the generalizability of the findings to other LLMs and datasets
  • Developers of LLMs should consider incorporating guided query rewriting and other strategies to reduce hallucination risk

Sources