MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment
arXiv:2603.08987v1 Announce Type: new Abstract: Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often …
Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning Guo
9 views