Certainty robustness: Evaluating LLM stability under self-challenging prompts
arXiv:2603.03330v1 Announce Type: new Abstract: Large language models (LLMs) often present answers with high apparent confidence despite lacking an explicit mechanism for reasoning about certainty …
Mohammadreza Saadat, Steve Nemzer
11 views