Academic

Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

arXiv:2603.13309v1 Announce Type: new Abstract: Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new challenges of high learning value. While prior work has largely focused on solver-side optimisation and verification, recent evidence suggests that self-evolving systems can exhibit diversity collapse in posing new problems after just a few iterations, even when surface-level variation is preserved. We introduce Prism, a question-centric self-evolution method that directly tackles this collapse. Prism defines a persistent diversity signal over an embedding-induced semantic partition of mathematical problems and uses it to encourage balanced exploration of underrepresented regions across iterations. This coverage signal is combined with a Zone-of-Proximal-Development (ZPD)

V
Vaibhav Mishra
· · 1 min read · 12 views

arXiv:2603.13309v1 Announce Type: new Abstract: Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new challenges of high learning value. While prior work has largely focused on solver-side optimisation and verification, recent evidence suggests that self-evolving systems can exhibit diversity collapse in posing new problems after just a few iterations, even when surface-level variation is preserved. We introduce Prism, a question-centric self-evolution method that directly tackles this collapse. Prism defines a persistent diversity signal over an embedding-induced semantic partition of mathematical problems and uses it to encourage balanced exploration of underrepresented regions across iterations. This coverage signal is combined with a Zone-of-Proximal-Development (ZPD) gate to preserve edge-of-solvability difficulty. Evaluated on seven widely used mathematical reasoning benchmarks against five self-evolving baselines, Prism achieves the highest accuracy on six out of seven tasks, achieving gains of +3.98 absolute points over R-Zero on AMC and +3.68 on Minerva Math. Prism also generates semantically diverse and challenging questions across iterations, resulting in the construction of the Prism-Math dataset comprising 100k mathematical questions. These results demonstrate that cross-iteration semantic coverage is a high-leverage and under-explored axis for building more capable self-evolving reasoners. We release the code, dataset, and models to facilitate further research.

Executive Summary

This article presents a novel self-evolving reasoning framework, Prism, designed to prevent curriculum collapse in Large Language Models (LLMs). Prism employs a question-centric approach, leveraging a persistent diversity signal and a Zone-of-Proximal-Development gate to ensure balanced exploration of underrepresented regions in the problem space. The framework is evaluated on seven mathematical reasoning benchmarks, outperforming five self-evolving baselines in six out of seven tasks. Prism also generates semantically diverse and challenging questions, resulting in the construction of a 100,000-question dataset. The results demonstrate the importance of cross-iteration semantic coverage in building capable self-evolving reasoners. The authors release the code, dataset, and models to facilitate further research.

Key Points

  • Prism introduces a question-centric self-evolution method to prevent curriculum collapse in LLMs.
  • The framework uses a persistent diversity signal and a Zone-of-Proximal-Development gate to encourage balanced exploration.
  • Prism outperforms five self-evolving baselines on six out of seven mathematical reasoning benchmarks.

Merits

Strength in Addressing Curriculum Collapse

Prism effectively tackles the issue of curriculum collapse, which is a significant limitation in self-evolving reasoning systems. The framework's ability to preserve diversity and encourage balanced exploration makes it a significant contribution to the field.

Improved Performance on Mathematical Reasoning Benchmarks

Prism's performance on six out of seven mathematical reasoning benchmarks demonstrates its effectiveness in building capable self-evolving reasoners. This improvement is particularly notable in tasks where surface-level variation is preserved.

Demerits

Limited Evaluation on Diverse Problem Spaces

While Prism demonstrates impressive performance on mathematical reasoning benchmarks, its evaluation on diverse problem spaces is limited. Further research is needed to determine the framework's adaptability to different domains and problem spaces.

Potential Overreliance on Zone-of-Proximal-Development Gate

The Zone-of-Proximal-Development gate may introduce bias in the problem selection process, potentially limiting the framework's ability to explore novel problem spaces. Further investigation is necessary to assess the gate's impact on Prism's performance.

Expert Commentary

The introduction of Prism marks a significant advancement in self-evolving reasoning systems, addressing a critical limitation in the field. The framework's performance on mathematical reasoning benchmarks is impressive, and its ability to preserve diversity and encourage balanced exploration is a notable strength. However, the limited evaluation on diverse problem spaces and potential overreliance on the Zone-of-Proximal-Development gate are areas that require further investigation. The implications of Prism's approach are far-reaching, with potential applications in education, assessment, and AI system design.

Recommendations

  • Researchers should investigate the adaptability of Prism to diverse problem spaces and domains to assess its generalizability.
  • Future work should explore the intersection of Prism's approach with adversarial training techniques to improve the robustness of reasoning systems.

Sources