Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
arXiv:2602.17686v1 Announce Type: cross Abstract: Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often …
Bowen Yu, Maolin Wang, Sheng Zhang, Binhao Wang, Yi Wen, Jingtong Gao, Bowen Liu, Zimo Zhao, Wanyu Wang, Xiangyu Zhao
4 views