Academic

Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models

arXiv:2603.22303v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment, motivating detectors that are accurate, lightweight, and broadly applicable. Since an LLM with a prompt defines a conditional distribution, we argue that the complexity of the distribution is an indicator of hallucination. However, the density of the distribution is unknown and the samples (i.e., responses generated for the prompt) are discrete distributions, which leads to a significant challenge in quantifying the complexity of the distribution. We propose to compute the optimal-transport distances between the sets of token embeddings of pairwise samples, which yields a Wasserstein distance matrix measuring the costs of transforming between the samples. This Wasserstein distance matrix provides a means to quantify the complexity of the distribution defined by the LLM with the prompt. Based on the Wasserstein distance matrix, we derive tw

Z
Zeyang Ding, Xinglin Hu, Jicong Fan
· · 1 min read · 1 views

arXiv:2603.22303v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment, motivating detectors that are accurate, lightweight, and broadly applicable. Since an LLM with a prompt defines a conditional distribution, we argue that the complexity of the distribution is an indicator of hallucination. However, the density of the distribution is unknown and the samples (i.e., responses generated for the prompt) are discrete distributions, which leads to a significant challenge in quantifying the complexity of the distribution. We propose to compute the optimal-transport distances between the sets of token embeddings of pairwise samples, which yields a Wasserstein distance matrix measuring the costs of transforming between the samples. This Wasserstein distance matrix provides a means to quantify the complexity of the distribution defined by the LLM with the prompt. Based on the Wasserstein distance matrix, we derive two complementary signals: AvgWD, measuring the average cost, and EigenWD, measuring the cost complexity. This leads to a training-free detector for hallucinations in LLMs. We further extend the framework to black-box LLMs via teacher forcing with an accessible teacher model. Experiments show that AvgWD and EigenWD are competitive with strong uncertainty baselines and provide complementary behavior across models and datasets, highlighting distribution complexity as an effective signal for LLM truthfulness.

Executive Summary

This article proposes a novel, training-free method for detecting hallucinations in large language models (LLMs), which are a significant obstacle to trustworthy deployment. By computing the Wasserstein distance matrix between the sets of token embeddings of pairwise samples, the authors derive two complementary signals: AvgWD and EigenWD, which measure the average cost and cost complexity of transforming between samples. These signals are used to detect hallucinations in LLMs, and experiments demonstrate their effectiveness and competitiveness with strong uncertainty baselines. The proposed method is also extended to black-box LLMs via teacher forcing with an accessible teacher model. This research has significant implications for the development and deployment of trustworthy LLMs, particularly in high-stakes applications such as healthcare and finance.

Key Points

  • Proposes a training-free method for detecting hallucinations in LLMs
  • Uses Wasserstein distance matrix to quantify the complexity of the distribution defined by the LLM
  • Derives two complementary signals: AvgWD and EigenWD, which measure the average cost and cost complexity
  • Experiments demonstrate the effectiveness and competitiveness of the proposed method

Merits

Strengths in Methodology

The proposed method is novel, training-free, and based on a sound theoretical foundation. The use of Wasserstein distance matrix to quantify the complexity of the distribution is a significant contribution to the field of LLMs.

Demerits

Limitations in Generalizability

The proposed method may not be generalizable to all types of LLMs, particularly those with complex or high-dimensional input spaces. Additionally, the method may not be effective in detecting subtle hallucinations or those that are not reflected in the Wasserstein distance matrix.

Expert Commentary

The proposed method is a significant contribution to the field of LLMs, and its implications are far-reaching. The use of Wasserstein distance matrix to quantify the complexity of the distribution is a novel and effective approach to detecting hallucinations in LLMs. However, the method may not be generalizable to all types of LLMs, and its effectiveness may be limited in certain scenarios. Nevertheless, the research has significant implications for the development and deployment of trustworthy LLMs, and it highlights the need for more robust and trustworthy LLMs in high-stakes applications.

Recommendations

  • Further research is needed to investigate the generalizability of the proposed method to different types of LLMs.
  • The method should be extended to detect subtle hallucinations and those that are not reflected in the Wasserstein distance matrix.

Sources

Original: arXiv - cs.LG