Academic

Academic · 1 min

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

arXiv:2604.02967v1 Announce Type: new Abstract: Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring …

Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, Guojie Song

7 views Apr 6

Academic · 1 min

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

arXiv:2604.02947v1 Announce Type: new Abstract: Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, …

Yunhao Feng, Yifan Ding, Yingshui Tan, Xingjun Ma, Yige Li, Yutao Wu, Yifeng Gao, Kun Zhai, Yanming Guo

7 views Apr 6

Academic · 1 min

Analysis of Optimality of Large Language Models on Planning Problems

arXiv:2604.02910v1 Announce Type: new Abstract: Classic AI planning problems have been revisited in the Large Language Model (LLM) era, with a focus of recent benchmarks …

Bernd Bohnet, Michael C. Mozer, Kevin Swersky, Wil Cunningham, Aaron Parisi, Kathleen Kenealy, Noah Fiedel

6 views Apr 6

Academic · 1 min

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

arXiv:2604.02869v1 Announce Type: new Abstract: Training tool-calling agents with reinforcement learning on multi-turn tasks remains challenging due to sparse outcome rewards and difficult credit assignment …

Wachiravit Modecrua, Krittanon Kaewtawee, Krittin Pachtrachai, Touchapon Kraisingkorn

5 views Apr 6

Academic · 1 min

EMS: Multi-Agent Voting via Efficient Majority-then-Stopping

arXiv:2604.02863v1 Announce Type: new Abstract: Majority voting is the standard for aggregating multi-agent responses into a final decision. However, traditional methods typically require all agents …

Yiqing Liu, Hantao Yao, Wu Liu, Yongdong Zhang

4 views Apr 6

Academic · 1 min

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

arXiv:2604.02834v1 Announce Type: new Abstract: Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events …

Chao Li, Cailiang Liu, Ang Gao, Kexin Deng, Shu Zhang, Langping Xu, Xiaotong Shi, Xionghao Ding, Jian Pei, Xun Jiang

5 views Apr 6

Academic · 1 min

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

arXiv:2604.02794v1 Announce Type: new Abstract: Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large …

Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu

4 views Apr 6

Academic · 1 min

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

arXiv:2604.02770v1 Announce Type: new Abstract: In large language model (LLM)-driven multi-agent systems, disobey role specification (failure to adhere to the defined responsibilities and constraints of …

Guoling Zhou, Wenpei Han, Fengqin Yang, Li Wang, Yingcong Zhou, Zhiguo Fu

5 views Apr 6

Academic · 1 min

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

arXiv:2604.02734v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated strong potential in long-horizon decision-making tasks, such as embodied manipulation and web interaction. However, …

Bin Wen, Ruoxuan Zhang, Yang Chen, Hongxia Xie, Lan-Zhe Guo

7 views Apr 6

Academic · 1 min

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a …

Amit Dhanda

6 views Apr 6

Academic · 1 min

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

arXiv:2604.02721v1 Announce Type: new Abstract: Competitive programming remains one of the last few human strongholds in coding against AI. The best AI system to date …

DeepReinforce Team, Xiaoya Li, Xiaofei Sun, Guoyin Wang, Songqiao Su, Chris Shum, Jiwei Li

6 views Apr 6

Academic · 1 min

Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization

arXiv:2604.02666v1 Announce Type: new Abstract: Optimization is as much about modeling the right problem as solving it. Identifying the right objectives, constraints, and trade-offs demands …

Joshua Drossman, Alexandre Jacquillat, S\'ebastien Martin

20 views Apr 6

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Analysis of Optimality of Large Language Models on Planning Problems

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

EMS: Multi-Agent Voting via Efficient Majority-then-Stopping

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization

JCG, PC

HSOLLC Co., Ltd.