Academic

A Geometric Taxonomy of Hallucinations in LLMs

arXiv:2602.13224v1 Announce Type: new Abstract: The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signa

J
Javier Mar\'in
· · 1 min read · 18 views

arXiv:2602.13224v1 Announce Type: new Abstract: The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signatures of prompted fabrication), while human-crafted confabulations capture genuine topical drift. The geometric structure differs because the underlying phenomena differ. Type III errors show 0.478 AUROC - indistinguishable from chance. This reflects a theoretical constraint: embeddings encode distributional co-occurrence, not correspondence to external reality. Statements with identical contextual patterns occupy similar embedding regions regardless of truth value. The contribution is a geometric taxonomy clarifying the scope of embedding-based detection: Types I and II are detectable; Type III requires external verification mechanisms.

Executive Summary

The article 'A Geometric Taxonomy of Hallucinations in LLMs' introduces a novel framework for categorizing hallucinations in large language models (LLMs) into three distinct types: unfaithfulness, confabulation, and factual error. The study reveals a significant asymmetry in the detectability of these hallucinations, with domain-local detection performing well but cross-domain detection failing at chance levels. The authors argue that this asymmetry arises because benchmarks capture stylistic artifacts of fabrication, while human-crafted confabulations reflect genuine topical drift. The geometric structure of embeddings is shown to differ based on the type of hallucination, with Type III errors being particularly challenging to detect due to the nature of embeddings encoding distributional co-occurrence rather than truth correspondence.

Key Points

  • Introduction of a geometric taxonomy for hallucinations in LLMs.
  • Identification of three types of hallucinations: unfaithfulness, confabulation, and factual error.
  • Detection asymmetry observed between domain-local and cross-domain hallucinations.
  • Human-crafted confabulations are more detectable than benchmark-generated ones.
  • Type III errors (factual errors) are indistinguishable from chance, requiring external verification.

Merits

Comprehensive Taxonomy

The proposed taxonomy provides a clear and structured way to understand different types of hallucinations in LLMs, which is crucial for developing targeted detection and mitigation strategies.

Empirical Rigor

The study employs rigorous empirical methods to validate the taxonomy and detectability of hallucinations, providing robust evidence for its claims.

Practical Implications

The findings have immediate practical implications for improving the reliability and accuracy of LLMs, particularly in applications requiring high levels of factual correctness.

Demerits

Limited Scope of Detection

The study highlights the limitations of current detection methods, particularly for cross-domain hallucinations and Type III errors, which may necessitate the development of new verification mechanisms.

Theoretical Constraints

The theoretical constraint that embeddings encode distributional co-occurrence rather than truth correspondence limits the generalizability of the findings to other types of errors or models.

Benchmark Dependence

The reliance on specific benchmarks for validation may introduce biases or limitations that could affect the broader applicability of the taxonomy.

Expert Commentary

The article presents a significant advancement in the understanding of hallucinations in LLMs by introducing a geometric taxonomy that differentiates between unfaithfulness, confabulation, and factual error. The empirical findings on the detectability of these hallucinations are particularly noteworthy, highlighting the challenges in cross-domain detection and the limitations of current embedding-based methods. The study's emphasis on the need for external verification mechanisms for Type III errors underscores the importance of integrating human oversight and additional verification processes in AI systems. The theoretical insights provided by the authors offer a nuanced perspective on the nature of hallucinations and their geometric signatures, contributing valuable knowledge to the field. However, the study's reliance on specific benchmarks and the theoretical constraints of embeddings warrant further exploration to ensure the broader applicability of the findings. Overall, this work sets a strong foundation for future research and practical applications aimed at improving the reliability and accuracy of LLMs.

Recommendations

  • Further research to develop advanced detection methods for cross-domain hallucinations and Type III errors.
  • Integration of external verification mechanisms in AI systems to enhance the accuracy and reliability of factual information.

Sources