DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models
arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a …
Amit Dhanda
3 views