Academic

First-Mover Bias in Gradient Boosting Explanations: Mechanism, Detection, and Resolution

Drake Caraker, Bryan Arnold, David Rhoads · March 25, 2026 · 1 min read · 1 views

#cs.LG #cs.AI

arXiv:2603.22346v1 Announce Type: new Abstract: We isolate and empirically characterize first-mover bias -- a path-dependent concentration of feature importance caused by sequential residual fitting in gradient boosting -- as a specific mechanistic cause of the well-known instability of SHAP-based feature rankings under multicollinearity. When correlated features compete for early splits, gradient boosting creates a self-reinforcing advantage for whichever feature is selected first: subsequent trees inherit modified residuals that favor the incumbent, concentrating SHAP importance on an arbitrary feature rather than distributing it across the correlated group. Scaling up a single model amplifies this effect -- a Large Single Model with the same total tree count as our method produces the worst explanations of any approach tested. We demonstrate that model independence is sufficient to resolve first-mover bias in the linear regime, and remains the most effective mitigation under nonlinear data-generating processes. Both our proposed method, DASH (Diversified Aggregation of SHAP), and simple seed-averaging (Stochastic Retrain) restore stability by breaking the sequential dependency chain, confirming that the operative mechanism is independence between explained models. At rho=0.9, both achieve stability=0.977, while the single-best workflow degrades to 0.958 and the Large Single Model to 0.938. On the Breast Cancer dataset, DASH improves stability from 0.32 to 0.93 (+0.61) against a tree-count-matched baseline. DASH additionally provides two diagnostic tools -- the Feature Stability Index (FSI) and Importance-Stability (IS) Plot -- that detect first-mover bias without ground truth, enabling practitioners to audit explanation reliability before acting on feature rankings. Software and reproducible benchmarks are available at https://github.com/DrakeCaraker/dash-shap.

Executive Summary

This article introduces the concept of 'first-mover bias' in gradient boosting explanations, a path-dependent concentration of feature importance that arises from sequential residual fitting. The authors empirically characterize this bias, demonstrating its destabilizing effect on SHAP-based feature rankings under multicollinearity. They propose two methods, DASH and Stochastic Retrain, to mitigate this bias and improve explanation stability. The results show that independence between explained models is sufficient to resolve the bias, and the proposed methods outperform existing workflows in terms of stability. The authors also introduce diagnostic tools to detect first-mover bias without ground truth. The findings have significant implications for the interpretation of machine learning models and the development of more reliable explanation methods.

Key Points

▸ First-mover bias is a path-dependent concentration of feature importance in gradient boosting explanations
▸ The bias is caused by sequential residual fitting and multicollinearity
▸ Independence between explained models is sufficient to resolve the bias

Merits

Strengths in Methodology

The authors employ a rigorous empirical approach to characterize and mitigate first-mover bias, using a range of evaluation metrics and datasets.

Diagnostic Tools

The proposed Feature Stability Index (FSI) and Importance-Stability (IS) Plot enable practitioners to audit explanation reliability without ground truth.

Demerits

Limitation in Generalizability

The study focuses on gradient boosting explanations and may not generalize to other machine learning models or explanation methods.

Assumption of Independence

The proposed methods assume independence between explained models, which may not be feasible in all scenarios.

Expert Commentary

The article makes a significant contribution to the field of machine learning explainability by identifying and addressing a major limitation in existing methods. The proposed methods and diagnostic tools demonstrate a clear understanding of the underlying mechanisms and provide a practical solution to the problem. However, the study's focus on gradient boosting explanations may limit its generalizability to other models and explanation methods. Additionally, the assumption of independence between explained models requires further exploration in future research.

Recommendations

✓ Future studies should investigate the applicability of the proposed methods to other machine learning models and explanation methods.
✓ Researchers should explore alternative approaches to mitigating first-mover bias that do not rely on independence between explained models.

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

First-Mover Bias in Gradient Boosting Explanations: Mechanism, Detection, and Resolution

AI Commentary

Executive Summary

Key Points

Merits

Strengths in Methodology

Diagnostic Tools

Demerits

Limitation in Generalizability

Assumption of Independence

Expert Commentary

Recommendations

Sources

Related Articles

Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals

Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling

BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

JCG, PC

HSOLLC Co., Ltd.