ERP-RiskBench: Leakage-Safe Ensemble Learning for Financial Risk
arXiv:2603.06671v1 Announce Type: new Abstract: Financial risk detection in Enterprise Resource Planning (ERP) systems is an important but underexplored application of machine learning. Published studies in this area tend to suffer from vague dataset descriptions, leakage-prone pipelines, and evaluation practices that inflate reported performance. This paper presents a rebuilt experimental framework for ERP financial risk detection using ensemble machine learning. The risk definition is hybrid, covering both procurement compliance anomalies and transactional fraud. A composite benchmark called ERP-RiskBench is assembled from public procurement event logs, labeled fraud data, and a new synthetic ERP dataset with rule-injected risk typologies and conditional tabular GAN augmentation. Nested cross-validation with time-aware and group-aware splitting enforces leakage prevention throughout the pipeline. The primary model is a stacking ensemble of gradient boosting methods, tested alongside
arXiv:2603.06671v1 Announce Type: new Abstract: Financial risk detection in Enterprise Resource Planning (ERP) systems is an important but underexplored application of machine learning. Published studies in this area tend to suffer from vague dataset descriptions, leakage-prone pipelines, and evaluation practices that inflate reported performance. This paper presents a rebuilt experimental framework for ERP financial risk detection using ensemble machine learning. The risk definition is hybrid, covering both procurement compliance anomalies and transactional fraud. A composite benchmark called ERP-RiskBench is assembled from public procurement event logs, labeled fraud data, and a new synthetic ERP dataset with rule-injected risk typologies and conditional tabular GAN augmentation. Nested cross-validation with time-aware and group-aware splitting enforces leakage prevention throughout the pipeline. The primary model is a stacking ensemble of gradient boosting methods, tested alongside linear baselines, deep tabular architectures, and an interpretable glassbox alternative. Performance is measured through Matthews Correlation Coefficient, area under the precision-recall curve, and cost-sensitive decision analysis using calibrated probabilities. Across multiple dataset configurations and a structured ablation study, the stacking ensemble achieves the best detection results. Leakage-safe protocols reduce previously inflated accuracy estimates by a notable margin. SHAP-based explanations and feature stability analysis show that procurement control features, especially three-way matching discrepancies, rank as the most informative predictors. The resulting framework provides a reproducible, operationally grounded blueprint for machine learning deployment in ERP audit and governance settings.
Executive Summary
This study presents a rebuilt experimental framework for ERP financial risk detection using ensemble machine learning. The framework, called ERP-RiskBench, addresses concerns with leakage-prone pipelines and overestimated performance in previous studies. The authors introduce a stacked ensemble of gradient boosting methods and evaluate its performance across multiple dataset configurations. The results indicate that the stacking ensemble outperforms other models and reduces previously inflated accuracy estimates. Furthermore, the study provides insights into feature importance and demonstrates the value of SHAP-based explanations and feature stability analysis. The resulting framework offers a reproducible and operationally grounded blueprint for machine learning deployment in ERP audit and governance settings.
Key Points
- ▸ The study presents a rebuilt experimental framework for ERP financial risk detection called ERP-RiskBench.
- ▸ The framework addresses concerns with leakage-prone pipelines and overestimated performance in previous studies.
- ▸ The authors introduce a stacked ensemble of gradient boosting methods and evaluate its performance across multiple dataset configurations.
Merits
Strength in Addressing Leakage Concerns
The study employs nested cross-validation with time-aware and group-aware splitting to prevent leakage throughout the pipeline, which is a significant improvement over previous studies.
Demerits
Limited Generalizability
The study's results may not be generalizable to other ERP systems or risk detection tasks due to the specific dataset configurations and risk typologies used in the study.
Expert Commentary
The study presents a significant contribution to the field of machine learning for financial risk detection in ERP systems. The authors' attention to leakage concerns and their use of a stacked ensemble of gradient boosting methods demonstrate a high level of expertise in the field. However, the study's limitations, such as the limited generalizability of the results, should be carefully considered in future research. Furthermore, the study's findings have important implications for the development of machine learning-based risk detection systems in ERP audit and governance settings.
Recommendations
- ✓ Future studies should aim to increase the generalizability of the results by using a wider range of dataset configurations and risk typologies.
- ✓ Researchers should continue to explore the use of explainable AI techniques, such as SHAP-based explanations and feature stability analysis, to provide insights into feature importance and model behavior.