Academic

A Bayesian Information-Theoretic Approach to Data Attribution

arXiv:2604.03858v1 Announce Type: new Abstract: Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to

D
Dharmesh Tailor, Nicol\`o Felicioni, Kamil Ciosek
· · 1 min read · 18 views

arXiv:2604.03858v1 Announce Type: new Abstract: Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

Executive Summary

This article presents a novel Bayesian information-theoretic approach to training data attribution (TDA), a critical task in enhancing model interpretability and safety. By scoring subsets based on the information loss they induce, the authors credit examples for resolving predictive uncertainty rather than label noise. The proposed method, which approximates information loss using a Gaussian Process surrogate, demonstrates competitive performance on counterfactual sensitivity, ground-truth retrieval, and coreset selection. This approach bridges the gap between principled measures and practical applications, making it a valuable addition to the field. The scalability of the method is also noteworthy, particularly in the context of modern neural networks.

Key Points

  • Formulation of TDA as a Bayesian information-theoretic problem
  • Use of information loss as a criterion for attributing influence to training examples
  • Approximation of information loss using a Gaussian Process surrogate
  • Demonstration of competitive performance on various tasks
  • Scalability of the method to modern neural networks

Merits

Strength in Resolving Predictive Uncertainty

The proposed method effectively credits examples for resolving predictive uncertainty rather than label noise, providing a more accurate and reliable attribution of influence.

Scalability to Modern Neural Networks

The method's ability to scale to large-scale neural networks makes it a valuable tool for practical applications in various domains.

Competitive Performance on Various Tasks

The proposed method demonstrates competitive performance on counterfactual sensitivity, ground-truth retrieval, and coreset selection, highlighting its effectiveness in different scenarios.

Demerits

Potential Overreliance on Gaussian Process Surrogate

The method's reliance on a Gaussian Process surrogate for approximating information loss may lead to inaccuracies or biases in certain situations.

Limited Exploration of Alternative Approaches

The article focuses primarily on the proposed method, with limited discussion of alternative approaches or the trade-offs between different methods.

Expert Commentary

The proposed Bayesian information-theoretic approach to TDA is a significant contribution to the field, offering a principled and scalable method for attributing influence to training examples. The use of information loss as a criterion for attribution is a particularly innovative aspect of the method, as it credits examples for resolving predictive uncertainty rather than label noise. While the method's reliance on a Gaussian Process surrogate may lead to potential biases or inaccuracies, the article's demonstration of competitive performance on various tasks suggests that the method is a valuable addition to the field. Furthermore, the scalability of the method to modern neural networks makes it a promising tool for practical applications in various domains.

Recommendations

  • Future research should explore alternative approaches to approximating information loss and the trade-offs between different methods.
  • The proposed method should be applied to a wider range of applications and scenarios to further demonstrate its effectiveness and limitations.

Sources

Original: arXiv - cs.LG