I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation
arXiv:2604.03904v1 Announce Type: new Abstract: Large language models (LLMs) frequently produce confident but incorrect answers, partly because common binary scoring conventions reward answering over honestly expressing uncertainty. We study whether prompt-only interventions -- explicitly announcing reward schemes for answer-versus-abstain decisions plus humility-oriented normative principles -- can reduce hallucination risk without modifying the model. Our focus is epistemic abstention on factual questions with a verifiable answer, where current LLMs often fail to abstain despite being uncertain about their answers. We first assess self-reported verbal confidence as a usable uncertainty signal, showing stability under prompt paraphrasing and reasonable calibration against a token-probability baseline. We then study I-CALM, a prompt-based framework that (i) elicits verbal confidence, (ii) partially rewards abstention through explicit reward schemes, and (iii) adds lightweight normativ
arXiv:2604.03904v1 Announce Type: new Abstract: Large language models (LLMs) frequently produce confident but incorrect answers, partly because common binary scoring conventions reward answering over honestly expressing uncertainty. We study whether prompt-only interventions -- explicitly announcing reward schemes for answer-versus-abstain decisions plus humility-oriented normative principles -- can reduce hallucination risk without modifying the model. Our focus is epistemic abstention on factual questions with a verifiable answer, where current LLMs often fail to abstain despite being uncertain about their answers. We first assess self-reported verbal confidence as a usable uncertainty signal, showing stability under prompt paraphrasing and reasonable calibration against a token-probability baseline. We then study I-CALM, a prompt-based framework that (i) elicits verbal confidence, (ii) partially rewards abstention through explicit reward schemes, and (iii) adds lightweight normative principles emphasizing truthfulness, humility, and responsibility. Using GPT-5 mini on PopQA as the main setting, we find that confidence-eliciting, abstention-rewarding prompts, especially with norms, reduce the false-answer rate on answered cases mainly by identifying and shifting error-prone cases to abstention and re-calibrating their confidence. This trades coverage for reliability while leaving forced-answer performance largely unchanged. Varying the abstention reward yields a clear abstention-hallucination frontier. Overall, results show the framework can improve selective answering on factual questions without retraining, with the magnitude of effect varying across models and datasets. Code is available at the following https://github.com/binzeli/hallucinationControl.
Executive Summary
This article proposes a novel framework, I-CALM, to mitigate hallucinations in large language models (LLMs) by incentivizing confidence-aware abstention. The framework consists of three components: confidence-eliciting prompts, abstention-rewarding prompts, and normative principles emphasizing truthfulness, humility, and responsibility. The authors demonstrate that I-CALM can reduce the false-answer rate in factual questions without changing the model's forced-answer performance. The results show that varying the abstention reward can create an abstention-hallucination frontier, indicating a trade-off between coverage and reliability.
Key Points
- ▸ I-CALM is a prompt-based framework that elicits verbal confidence and rewards abstention in LLMs.
- ▸ The framework consists of confidence-eliciting, abstention-rewarding prompts, and normative principles.
- ▸ I-CALM can improve selective answering on factual questions without retraining the model.
Merits
Strength in Addressing Hallucinations
I-CALM directly addresses the issue of hallucinations in LLMs by incentivizing abstention, which is a significant improvement over current models that often produce confident but incorrect answers.
Flexibility in Design
The framework is flexible and can be adapted to various models and datasets, making it a potential solution for the broader AI community.
Demerits
Limited Generalizability
The study's results might not be generalizable to other models or datasets, as the authors conducted experiments on a single model, GPT-5 mini, and a specific dataset, PopQA.
Abstention-Reward Trade-off
The authors found a trade-off between coverage and reliability, which might limit the applicability of I-CALM in certain scenarios.
Expert Commentary
While I-CALM is a promising solution for hallucination mitigation, its limitations, such as the trade-off between coverage and reliability, need to be carefully considered. Furthermore, the study's focus on a single model and dataset might limit its generalizability. Nevertheless, the framework's flexibility and adaptability make it a valuable contribution to the field of AI research. As AI continues to play an increasingly important role in our lives, it is essential to develop more robust and reliable AI systems, and I-CALM is a step in the right direction.
Recommendations
- ✓ Recommendation 1: Future studies should investigate the applicability of I-CALM to other models and datasets to improve its generalizability.
- ✓ Recommendation 2: The AI community should prioritize the development of more robust and reliable AI systems that can adapt to various scenarios and applications.
Sources
Original: arXiv - cs.CL