This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Kejian Shi, Yixin Liu, Peifeng Wang, Alexander R. Fabbri, Shafiq Joty, Arman Cohan

Articles by Kejian Shi, Yixin Liu, Peifeng Wang, Alexander R. Fabbri, Shafiq Joty, Arman Cohan

Academic · 1 min

References Improve LLM Alignment in Non-Verifiable Domains

arXiv:2602.16802v1 Announce Type: new Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has shown strong effectiveness in reasoning tasks, it cannot be directly applied to …

4 views Feb 22

Something extraordinary is coming.

Kejian Shi, Yixin Liu, Peifeng Wang, Alexander R. Fabbri, Shafiq Joty, Arman Cohan

Articles by Kejian Shi, Yixin Liu, Peifeng Wang, Alexander R. Fabbri, Shafiq Joty, Arman Cohan

References Improve LLM Alignment in Non-Verifiable Domains

JCG, PC

HSOLLC Co., Ltd.