Reinforcement Learning-based Knowledge Distillation with LLM-as-a-Judge
arXiv:2604.02621v1 Announce Type: new Abstract: Reinforcement Learning (RL) has been shown to substantially improve the reasoning capability of small and large language models (LLMs), but …
Yiyang Shen, Lifu Tu, Weiran Wang
6 views