Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
arXiv:2603.03323v1 Announce Type: cross Abstract: Large language models (LLMs) aligned for safety often suffer from over-refusal, the tendency to reject seemingly toxic or benign prompts …
Yuxiao Lu, Lin Xu, Yang Sun, Wenjun Li, Jie Shi
4 views