EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research
arXiv:2602.15034v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are reshaping the paradigm of AI for Social Science (AI4SS), rigorously evaluating their capabilities in …
Houping Yue, Zixiang Di, Mei Jiang, Bingdong Li, Hao Hao, Yu Song, Bo Jiang, Aimin Zhou
5 views