BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
arXiv:2603.04124v1 Announce Type: new Abstract: Can reinforcement learning with hard, verifiable rewards teach a compact language model to reason about physics, or does it primarily …
Tarjei Paule Hage, Markus J. Buehler
18 views