DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
arXiv:2602.24288v1 Announce Type: new Abstract: The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need …
Fan Shu, Yite Wang, Ruofan Wu, Boyi Liu, Zhewei Yao, Yuxiong He, Feng Yan
5 views