Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity
arXiv:2602.15894v1 Announce Type: new Abstract: Recent research indicates that while alignment methods significantly improve the quality of large language model(LLM) outputs, they simultaneously reduce the …
Haihui Pan, Yuzhong Hong, Shaoke Lv, Junwei Bao, Hongfei Jiang, Yang Song
17 views