X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
arXiv:2602.15298v1 Announce Type: new Abstract: Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95%
arXiv:2602.15298v1 Announce Type: new Abstract: Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.
Executive Summary
This study presents X-MAP, an eXplainable Misclassification Analysis and Profiling framework for spam and phishing detection. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets demonstrate X-MAP's effectiveness and interpretability, achieving up to 0.98 AUROC and lowering the false-rejection rate to 0.089 on positive predictions. X-MAP can also recover up to 97% of falsely rejected correct predictions with moderate leakage. The study's findings highlight the importance of explainability in machine learning and the potential of X-MAP to improve spam and phishing detection.
Key Points
- ▸ X-MAP is an eXplainable Misclassification Analysis and Profiling framework for spam and phishing detection.
- ▸ X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles.
- ▸ Experiments demonstrate X-MAP's effectiveness and interpretability in spam and phishing detection.
Merits
Strength in Explainability
X-MAP's ability to provide interpretable topic profiles and measure message deviation using Jensen-Shannon divergence makes it a valuable tool for understanding and improving spam and phishing detection.
Improved Detection Accuracy
X-MAP's effectiveness in achieving up to 0.98 AUROC and lowering the false-rejection rate to 0.089 on positive predictions demonstrates its potential to improve spam and phishing detection.
Robustness to False Rejections
X-MAP's ability to recover up to 97% of falsely rejected correct predictions with moderate leakage highlights its potential to improve the robustness of spam and phishing detection systems.
Demerits
Limited Dataset Consideration
The study's reliance on SMS and phishing datasets may limit the generalizability of X-MAP's results to other types of spam and phishing detection scenarios.
Computational Complexity
The computational complexity of X-MAP's non-negative matrix factorization and Jensen-Shannon divergence calculations may be a limitation in large-scale spam and phishing detection applications.
Expert Commentary
The study's results and recommendations have significant implications for the development and improvement of spam and phishing detection systems. X-MAP's ability to provide interpretable topic profiles and measure message deviation using Jensen-Shannon divergence highlights the importance of machine learning interpretability in AI research. The study's findings also highlight the potential of explainability in machine learning to improve the accuracy and robustness of spam and phishing detection systems. However, the study's reliance on SMS and phishing datasets may limit the generalizability of X-MAP's results to other types of spam and phishing detection scenarios.
Recommendations
- ✓ Future studies should investigate the generalizability of X-MAP's results to other types of spam and phishing detection scenarios.
- ✓ Developers should consider incorporating X-MAP into existing spam and phishing detection systems to improve their accuracy and robustness.