Academic

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

arXiv:2602.21628v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a prevailing paradigm for enhancing reasoning in Multimodal Large Language Models (MLLMs). However, relying solely on outcome supervision risks reward hacking, where models learn spurious reasoning patterns to satisfy final answer checks. While recent rubric-based approaches offer fine-grained supervision signals, they suffer from high computational costs of instance-level generation and inefficient training dynamics caused by treating all rubrics as equally learnable. In this paper, we propose Stratified Rubric-based Curriculum Learning (RuCL), a novel framework that reformulates curriculum learning by shifting the focus from data selection to reward design. RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence. By dynamically adjusting rubric weights during training, RuCL guides the model from mastering foundati

Yukun Chen, Jiaming Li, Longze Chen, Ze Gong, Jingpeng Li, Zhen Qin, Hengyu Chang, Ancheng Xu, Zhihao Yang, Hamid Alinejad-Rokny, Qiang Qu, Bo Zheng, Min Yang · February 27, 2026 · 1 min read · 3 views

#cs.CL

Executive Summary

This article introduces Stratified Rubric-Based Curriculum Learning (RuCL), a novel framework that enhances reasoning in Multimodal Large Language Models (MLLMs) by reformulating curriculum learning through reward design. RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence, dynamically adjusting rubric weights during training. Experiments on various visual reasoning benchmarks demonstrate a remarkable +7.83% average improvement over the Qwen2.5-VL-7B model, achieving a state-of-the-art accuracy of 60.06%. This breakthrough showcases RuCL's potential to overcome the limitations of outcome supervision and reward hacking in MLLMs.

Key Points

▸ RuCL reformulates curriculum learning through reward design to enhance reasoning in MLLMs
▸ RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence
▸ RuCL yields a remarkable +7.83% average improvement over the Qwen2.5-VL-7B model on visual reasoning benchmarks

Merits

Strength in Addressing Reward Hacking

RuCL tackles the issue of reward hacking by generating generalized rubrics and stratifying them based on the model's competence, guiding the model from foundational perception to advanced logical reasoning.

Improved Accuracy on Visual Reasoning Benchmarks

RuCL achieves a state-of-the-art accuracy of 60.06% on visual reasoning benchmarks, demonstrating its potential to overcome the limitations of outcome supervision and reward hacking in MLLMs.

Demerits

Potential Computational Costs of Instance-Level Generation

The high computational costs of instance-level generation may hinder the practical application of RuCL, especially for large-scale MLLMs.

Expert Commentary

The introduction of RuCL marks a significant breakthrough in the development of MLLMs, showcasing its potential to overcome the limitations of outcome supervision and reward hacking. The stratified rubric-based curriculum learning framework enables the model to learn from foundational perception to advanced logical reasoning, achieving state-of-the-art accuracy on visual reasoning benchmarks. This development has far-reaching implications for the practical application of MLLMs, including their potential to improve decision-making in various domains. However, the potential computational costs of instance-level generation remain a concern, requiring further investigation to ensure the practical feasibility of RuCL.

Recommendations

✓ Further research is needed to investigate the computational costs of instance-level generation and develop strategies to mitigate them.
✓ The development of more advanced and adaptable MLLMs, enabled by RuCL, should be prioritized to ensure their practical application in various domains.

Sources

arXiv - cs.CL

Something extraordinary is coming.

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Reward Hacking

Improved Accuracy on Visual Reasoning Benchmarks

Demerits

Potential Computational Costs of Instance-Level Generation

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.