Academic

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment

arXiv:2604.05358v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) mitigates hallucination but does not eliminate it: a deployed system must still decide, at inference time, whether its answer is actually supported by the retrieved evidence. We introduce LatentAudit, a white-box auditor that pools mid-to-late residual-stream activations from an open-weight generator and measures their Mahalanobis distance to the evidence representation. The resulting quadratic rule requires no auxiliary judge model, runs at generation time, and is simple enough to calibrate on a small held-out set. We show that residual-stream geometry carries a usable faithfulness signal, that this signal survives architecture changes and realistic retrieval failures, and that the same rule remains amenable to public verification. On PubMedQA with Llama-3-8B, LatentAudit reaches 0.942 AUROC with 0.77,ms overhead. Across three QA benchmarks and five model families (Llama-2/3, Qwen-2.5/3, Mistral), th

Z
Zhe Yu, Wenpeng Xing, Meng Han
· · 1 min read · 6 views

arXiv:2604.05358v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) mitigates hallucination but does not eliminate it: a deployed system must still decide, at inference time, whether its answer is actually supported by the retrieved evidence. We introduce LatentAudit, a white-box auditor that pools mid-to-late residual-stream activations from an open-weight generator and measures their Mahalanobis distance to the evidence representation. The resulting quadratic rule requires no auxiliary judge model, runs at generation time, and is simple enough to calibrate on a small held-out set. We show that residual-stream geometry carries a usable faithfulness signal, that this signal survives architecture changes and realistic retrieval failures, and that the same rule remains amenable to public verification. On PubMedQA with Llama-3-8B, LatentAudit reaches 0.942 AUROC with 0.77,ms overhead. Across three QA benchmarks and five model families (Llama-2/3, Qwen-2.5/3, Mistral), the monitor remains stable; under a four-way stress test with contradictions, retrieval misses, and partial-support noise, it reaches 0.9566--0.9815 AUROC on PubMedQA and 0.9142--0.9315 on HotpotQA. At 16-bit fixed-point precision, the audit rule preserves 99.8% of the FP16 AUROC, enabling Groth16-based public verification without revealing model weights or activations. Together, these results position residual-stream geometry as a practical basis for real-time RAG faithfulness monitoring and optional verifiable deployment.

Executive Summary

The paper introduces *LatentAudit*, a novel white-box auditor for Retrieval-Augmented Generation (RAG) systems that monitors the faithfulness of generated outputs in real time. By leveraging residual-stream activations from open-weight generators and measuring their Mahalanobis distance to evidence representations, the method avoids auxiliary judge models and operates efficiently at inference time. The approach demonstrates high accuracy (AUROC up to 0.9815) across multiple benchmarks and model architectures, including stress tests with contradictions and retrieval failures. Notably, it supports verifiable deployment via fixed-point precision and zero-knowledge proofs (Groth16), preserving 99.8% of FP16 performance. The work positions residual-stream geometry as a practical foundation for faithfulness monitoring in RAG systems.

Key Points

  • LatentAudit is a white-box auditor that uses residual-stream activations to monitor RAG faithfulness in real time without auxiliary models.
  • It employs Mahalanobis distance to measure alignment between generator activations and retrieved evidence, achieving high AUROC across diverse benchmarks and model families.
  • The method supports verifiable deployment through fixed-point precision and Groth16-based public verification, ensuring privacy-preserving audits.

Merits

Novelty and Theoretical Rigor

The paper introduces a groundbreaking approach to RAG faithfulness monitoring by leveraging residual-stream geometry, a previously underexplored signal in this context. The use of Mahalanobis distance for activation alignment is both theoretically justified and empirically validated.

Practical Efficiency and Scalability

LatentAudit operates in real time with minimal overhead (0.77ms) and demonstrates stability across multiple model architectures and benchmarks, including stress tests with contradictions and retrieval failures. This scalability is critical for deployment in real-world systems.

Verifiable and Privacy-Preserving Deployment

The system’s compatibility with fixed-point precision and Groth16-based zero-knowledge proofs enables public verification without exposing model weights or activations. This addresses key concerns in trustworthy AI, particularly in sensitive domains like healthcare (e.g., PubMedQA).

Broad Applicability

The method’s performance across five model families (Llama-2/3, Qwen-2.5/3, Mistral) and three QA benchmarks suggests generalizability. The stress tests further validate robustness under realistic failure modes.

Demerits

Dependence on Open-Weight Models

LatentAudit’s reliance on open-weight generators may limit its applicability to proprietary or closed-source models, which are common in industry settings. This could hinder widespread adoption unless adaptations for black-box models are explored.

Calibration Requirements

The method requires calibration on a small held-out set, which may introduce variability in performance across domains or datasets. The robustness of this calibration process across diverse contexts remains an open question.

Fixed-Point Precision Overhead

While the system preserves 99.8% of FP16 AUROC at 16-bit precision, the trade-off between precision and computational efficiency (e.g., in resource-constrained environments) warrants further investigation, particularly in latency-sensitive applications.

Limited to QA Benchmarks

The evaluation focuses primarily on QA benchmarks (e.g., PubMedQA, HotpotQA), which may not fully capture the nuances of other RAG applications, such as summarization or code generation. Extending the method to these domains is a critical next step.

Expert Commentary

LatentAudit represents a significant advancement in the monitoring of RAG systems, addressing a critical gap in real-time faithfulness verification. The use of residual-stream geometry as a signal for alignment is both innovative and theoretically grounded, leveraging the geometric properties of model internals to infer output reliability. This approach avoids the pitfalls of black-box auditors, which often struggle with interpretability and computational overhead. The empirical results are impressive, demonstrating robustness across multiple benchmarks and model architectures, including stress tests with contradictions and retrieval failures. The integration of verifiable computation (Groth16) is particularly noteworthy, as it addresses the growing demand for privacy-preserving audits in regulated industries. However, the method’s reliance on open-weight models and the need for calibration on held-out data may limit its immediate applicability in proprietary or domain-specific settings. Future work should explore adaptations for closed-source models and extend evaluations to non-QA applications, such as summarization or code generation. Overall, LatentAudit sets a new benchmark for real-time RAG faithfulness monitoring and verifiable deployment, with far-reaching implications for trustworthy AI.

Recommendations

  • Explore adaptations of LatentAudit for closed-source or proprietary RAG models, potentially by developing proxy-based methods to approximate residual-stream geometry without direct access to model internals.
  • Extend the evaluation to non-QA applications (e.g., summarization, code generation) to assess the generalizability of residual-stream geometry as a faithfulness signal across diverse RAG use cases.
  • Investigate the robustness of the calibration process across diverse domains and datasets, with a focus on minimizing variability in performance due to domain shift or dataset bias.
  • Collaborate with regulators and industry stakeholders to pilot LatentAudit in high-stakes domains (e.g., healthcare, finance) to assess its practical impact and alignment with compliance requirements.
  • Develop open-source toolkits or libraries to facilitate adoption of LatentAudit, including pre-trained calibration models and integration guides for major RAG frameworks (e.g., LangChain, Haystack).

Sources

Original: arXiv - cs.AI