QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models
arXiv:2603.13691v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel on standardized medical exams, high scores often fail to translate to high-quality responses for …
Yao Wu, Kangping Yin, Liang Dong, Zhenxin Ma, Shuting Xu, Xuehai Wang, Yuxuan Jiang, Tingting Yu, Yunqing Hong, Jiayi Liu, Rianzhe Huang, Shuxin Zhao, Haiping Hu, Wen Shang, Jian Xu, Guanjun Jiang
17 views