GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning
arXiv:2602.21492v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive …