Academic

HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models

arXiv:2603.05828v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable performance in text generation and knowledge-intensive question answering. Nevertheless, they are prone to producing hallucinated content, which severely undermines their reliability in high-stakes application domains. Existing hallucination attribution approaches, based on either external knowledge retrieval or internal model mechanisms, primarily focus on semantic similarity matching or representation-level discrimination. As a result, they have difficulty establishing structured correspondences at the span level between hallucination types, underlying error generation mechanisms, and external factual evidence, thereby limiting the interpretability of hallucinated fragments and the traceability of supporting or opposing evidence. To address these limitations, we propose HART, a fine-grained hallucination attribution and evidence retrieval framework for large language models. HAR

S
Shize Liang, Hongzhi Wang
· · 1 min read · 8 views

arXiv:2603.05828v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable performance in text generation and knowledge-intensive question answering. Nevertheless, they are prone to producing hallucinated content, which severely undermines their reliability in high-stakes application domains. Existing hallucination attribution approaches, based on either external knowledge retrieval or internal model mechanisms, primarily focus on semantic similarity matching or representation-level discrimination. As a result, they have difficulty establishing structured correspondences at the span level between hallucination types, underlying error generation mechanisms, and external factual evidence, thereby limiting the interpretability of hallucinated fragments and the traceability of supporting or opposing evidence. To address these limitations, we propose HART, a fine-grained hallucination attribution and evidence retrieval framework for large language models. HART formalizes hallucination tracing as a structured modeling task comprising four stages: span localization, mechanism attribution, evidence retrieval, and causal tracing. Based upon this formulation, we develop the first structured dataset tailored for hallucination tracing, in which hallucination types, error mechanisms, and sets of counterfactual evidence are jointly annotated to enable causal-level interpretability evaluation. Experimental results on the proposed dataset demonstrate that HART substantially outperforms strong retrieval baselines, including BM25 and DPR, validating the effectiveness and generalization capability of the proposed tracing paradigm for hallucination analysis and evidence alignment.

Executive Summary

The article proposes HART, a novel framework for hallucination attribution and evidence-based tracing in large language models. HART formalizes hallucination tracing as a structured modeling task, comprising four stages: span localization, mechanism attribution, evidence retrieval, and causal tracing. Experimental results demonstrate that HART outperforms strong retrieval baselines, validating its effectiveness for hallucination analysis and evidence alignment. This framework has significant implications for improving the reliability of large language models in high-stakes application domains.

Key Points

  • HART is a fine-grained hallucination attribution and evidence retrieval framework
  • The framework formalizes hallucination tracing as a structured modeling task with four stages
  • HART outperforms strong retrieval baselines, including BM25 and DPR, in experimental results

Merits

Improved Hallucination Attribution

HART provides a more accurate and structured approach to hallucination attribution, enabling better understanding of hallucinated content

Enhanced Evidence Retrieval

The framework facilitates the retrieval of relevant evidence to support or oppose hallucinated content, increasing the reliability of large language models

Demerits

Complexity of Implementation

The proposed framework may require significant computational resources and expertise to implement, potentially limiting its adoption

Dependence on High-Quality Training Data

The effectiveness of HART relies on the availability of high-quality training data, which can be time-consuming and costly to annotate

Expert Commentary

The proposed HART framework represents a significant advancement in hallucination attribution and evidence-based tracing for large language models. By formalizing hallucination tracing as a structured modeling task, HART provides a more comprehensive and accurate approach to understanding hallucinated content. The framework's ability to retrieve relevant evidence to support or oppose hallucinated content is particularly noteworthy, as it has the potential to increase the reliability of large language models in high-stakes application domains. However, the complexity of implementation and dependence on high-quality training data may limit the widespread adoption of HART.

Recommendations

  • Further research on the scalability and efficiency of HART, exploring ways to reduce computational resources and expertise required for implementation
  • Investigation into the application of HART in various domains, including healthcare, finance, and education, to evaluate its effectiveness and potential impact

Sources