Academic

When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Wisdom Ikezogwo, Mehmet Saygin Seyfioglu, Ranjay Krishna, Karim Bouyarmane · March 9, 2026 · 1 min read · 42 views

#cs.CV #cs.AI #cs.LG

arXiv:2603.05659v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) and Rubrics as Rewards (RaR) have driven strong gains in domains with clear correctness signals and even in subjective domains by synthesizing evaluation criteria from ideal reference answers. But many real-world tasks admit multiple valid outputs and lack the single ideal answer that rubric generation depends on. We identify this reference-free setting as a gap in current post-training methods and propose Implicit Error Counting (IEC) to fill it. Instead of checking what a response gets right against a rubric, IEC enumerates what it gets wrong, applying severity-weighted scores across task-relevant axes and converting them into calibrated per-aspect rewards. We show that na\"ive explicit enumeration is too noisy for stable optimization, and that two design choices: implicit score emission and group calibration are necessary to make error counting a reliable reward. As a case study, we validate IEC on virtual try-on (VTO), a domain that is simultaneously too constrained for holistic scoring and too permissive for rubric-based evaluation: subtle garment errors are unacceptable, yet many output variations are correct. We introduce Cascaded Error Counting (CEC) as an evaluation metric, which tracks human preferences well (60% top-1 vs. 30% others), and curate Mismatch-DressCode (MDressBench), a benchmark with maximal attribute mismatch to stress-test reward designs. On MDressBench, IEC outperforms RaR across all metrics (CEC: 5.31 vs. 5.60 on flat references; 5.20 vs. 5.53 on non-flat). On VITON-HD and DressCode, IEC matches or surpasses six baselines on 6 of 8 perceptual metrics. These results suggest that when ideal answers are unavailable, counting errors provide a stronger signal than constructing rubrics.

Executive Summary

The article proposes Implicit Error Counting (IEC) as a post-training method for reference-free reinforcement learning, where ideal answers are unavailable. IEC enumerates errors in responses and converts them into calibrated rewards, outperforming traditional rubric-based methods in virtual try-on tasks. The approach is validated through a case study and benchmarking, demonstrating its effectiveness in domains with multiple valid outputs and subtle errors.

Key Points

▸ IEC is a novel approach to reference-free reinforcement learning
▸ Error enumeration is used to generate rewards instead of traditional rubric-based methods
▸ IEC outperforms traditional methods in virtual try-on tasks with multiple valid outputs

Merits

Effective in reference-free settings

IEC can handle tasks with multiple valid outputs and no ideal answer

Improved performance

IEC outperforms traditional rubric-based methods in virtual try-on tasks

Demerits

Limited to specific domains

IEC may not be applicable to all domains, particularly those with clear correctness signals

Requires careful calibration

IEC requires careful calibration of error counting and reward generation

Expert Commentary

The article presents a significant contribution to the field of reinforcement learning, particularly in reference-free settings. The proposed IEC approach demonstrates improved performance in virtual try-on tasks, highlighting its potential for applications in domains with multiple valid outputs. However, further research is needed to fully explore the limitations and potential applications of IEC. The article's thorough evaluation and benchmarking provide a solid foundation for future work in this area.

Recommendations

✓ Further research is needed to explore the applicability of IEC to other domains
✓ Careful calibration of error counting and reward generation is crucial for effective implementation of IEC

Sources

arXiv - cs.AI

When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

AI Commentary

Executive Summary

Key Points

Merits

Effective in reference-free settings

Improved performance

Demerits

Limited to specific domains

Requires careful calibration

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs