UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
arXiv:2602.22296v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large language models (LLMs) on mathematics and programming …
Devan Shah, Owen Yang, Daniel Yang, Chongyi Zheng, Benjamin Eysenbach
8 views