Academic

Towards Accurate and Calibrated Classification: Regularizing Cross-Entropy From A Generative Perspective

arXiv:2604.06689v1 Announce Type: new Abstract: Accurate classification requires not only high predictive accuracy but also well-calibrated confidence estimates. Yet, modern deep neural networks (DNNs) are often overconfident, primarily due to overfitting on the negative log-likelihood (NLL). While focal loss variants alleviate this issue, they typically reduce accuracy, revealing a persistent trade-off between calibration and predictive performance. Motivated by the complementary strengths of generative and discriminative classifiers, we propose Generative Cross-Entropy (GCE), which maximizes $p(x|y)$ and is equivalent to cross-entropy augmented with a class-level confidence regularizer. Under mild conditions, GCE is strictly proper. Across CIFAR-10/100, Tiny-ImageNet, and a medical imaging benchmark, GCE improves both accuracy and calibration over cross-entropy, especially in the long-tailed scenario. Combined with adaptive piecewise temperature scaling (ATS), GCE attains calibratio

Q
Qipeng Zhan, Zhuoping Zhou, Li Shen
· · 1 min read · 15 views

arXiv:2604.06689v1 Announce Type: new Abstract: Accurate classification requires not only high predictive accuracy but also well-calibrated confidence estimates. Yet, modern deep neural networks (DNNs) are often overconfident, primarily due to overfitting on the negative log-likelihood (NLL). While focal loss variants alleviate this issue, they typically reduce accuracy, revealing a persistent trade-off between calibration and predictive performance. Motivated by the complementary strengths of generative and discriminative classifiers, we propose Generative Cross-Entropy (GCE), which maximizes $p(x|y)$ and is equivalent to cross-entropy augmented with a class-level confidence regularizer. Under mild conditions, GCE is strictly proper. Across CIFAR-10/100, Tiny-ImageNet, and a medical imaging benchmark, GCE improves both accuracy and calibration over cross-entropy, especially in the long-tailed scenario. Combined with adaptive piecewise temperature scaling (ATS), GCE attains calibration competitive with focal-loss variants without sacrificing accuracy.

Executive Summary

This article introduces Generative Cross-Entropy (GCE), a novel regularization approach designed to enhance both predictive accuracy and calibration in deep neural networks. GCE reinterprets cross-entropy from a generative perspective, maximizing conditional likelihood $p(x|y)$, which implicitly adds a class-level confidence regularizer. The authors demonstrate that GCE mitigates the overconfidence issues prevalent in modern DNNs, often caused by overfitting on negative log-likelihood, without incurring the accuracy trade-offs seen with focal loss variants. Empirical results across diverse datasets, including long-tailed scenarios and medical imaging, show GCE improves both metrics. When combined with adaptive piecewise temperature scaling (ATS), GCE achieves state-of-the-art calibration while maintaining high accuracy.

Key Points

  • Modern DNNs often suffer from overconfidence due to overfitting on NLL, leading to a trade-off between accuracy and calibration.
  • GCE reinterprets cross-entropy from a generative perspective, maximizing $p(x|y)$, which introduces an inherent class-level confidence regularizer.
  • GCE is demonstrated to be strictly proper under mild conditions, a desirable theoretical property for loss functions.
  • Empirical evaluation shows GCE improves both accuracy and calibration across various benchmarks, particularly in long-tailed data distributions.
  • When combined with post-hoc calibration techniques like ATS, GCE achieves competitive calibration with focal loss variants without sacrificing predictive performance.

Merits

Novel Theoretical Foundation

The generative reinterpretation of cross-entropy provides a fresh theoretical lens, grounding the regularization in a probabilistic framework rather than purely heuristic adjustments.

Dual Improvement

Successfully addresses the long-standing trade-off, improving both accuracy and calibration simultaneously, which is a significant advancement over prior methods like focal loss.

Strictly Proper Loss

The demonstration of GCE as a strictly proper loss function under mild conditions adds significant theoretical rigor and ensures its consistency and optimality properties.

Robustness to Long-Tailed Data

Its effectiveness in long-tailed scenarios is particularly valuable, as imbalanced datasets are common in real-world applications and often exacerbate calibration issues.

Demerits

Complexity of Implementation

While conceptually elegant, the practical implementation might introduce additional computational overhead or hyperparameter tuning compared to vanilla cross-entropy.

Generalizability of 'Mild Conditions'

The 'mild conditions' under which GCE is strictly proper warrant further scrutiny to understand their practical implications and whether they hold universally across diverse DNN architectures and datasets.

Reliance on Post-Hoc Calibration

While GCE improves calibration intrinsically, its best performance is achieved when combined with ATS, suggesting it doesn't fully eliminate the need for post-hoc adjustments.

Limited Generative Capacity

The 'generative perspective' primarily serves as a motivation for regularization; GCE itself does not directly perform generative modeling or provide samples, which might be a misinterpretation for some readers expecting a full generative model.

Expert Commentary

This paper presents a sophisticated and timely contribution to the field of trustworthy AI. The reinterpretation of cross-entropy through a generative lens is not merely an elegant theoretical exercise but yields a practical, demonstrably superior loss function. The ability of GCE to simultaneously enhance both accuracy and calibration, particularly in challenging long-tailed distributions, addresses a critical limitation of previous approaches. The theoretical grounding as a strictly proper loss function adds significant weight, elevating GCE beyond a mere heuristic. While the necessity of combining GCE with ATS for peak calibration suggests intrinsic calibration remains an elusive ideal, GCE significantly narrows the gap. This work will undoubtedly spur further research into hybrid generative-discriminative loss functions and may well become a standard component in the deep learning toolkit for applications demanding high-fidelity uncertainty estimates. Its implications for regulatory compliance and responsible AI development are profound, offering a pathway to more reliable and accountable AI systems.

Recommendations

  • Conduct further theoretical analysis to precisely delineate the 'mild conditions' for strict properness and explore their robustness across a wider range of data distributions and model architectures.
  • Investigate the computational overhead and memory footprint of GCE compared to standard cross-entropy, especially for very large models and datasets, to assess its scalability.
  • Explore the integration of GCE with other regularization techniques (e.g., dropout, batch normalization variants) to understand synergistic effects on calibration and accuracy.
  • Evaluate GCE's performance in a broader array of real-world, high-stakes applications beyond medical imaging, such as financial risk assessment or legal document analysis, where calibrated confidence is critical.

Sources

Original: arXiv - cs.LG