Category

Academic

Academic · 1 min

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

arXiv:2604.05364v1 Announce Type: new Abstract: We introduce TFRBench, the first benchmark designed to evaluate the reasoning capabilities of forecasting systems. Traditionally, time-series forecasting has been …

Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister
4 views
Academic · 1 min

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

arXiv:2604.05355v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves large language model performance on complex tasks, but often produces excessively long and inefficient reasoning traces. …

Xuan Xiong, Huan Liu, Li Gu, Zhixiang Chi, Yue Qiu, Yuanhao Yu, Yang Wang
5 views
Academic · 1 min

TRACE: Capability-Targeted Agentic Training

arXiv:2604.05336v1 Announce Type: new Abstract: Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances, where a capability is …

Hangoo Kang, Tarun Suresh, Jon Saad-Falcon, Azalia Mirhoseini
20 views