Academic

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Shu-Xun Yang, Cunxiang Wang, Haoke Zhang, Wenbo Yu, Lindong Wu, Jiayi Gui, Dayong Yang, Yukuo Cen, Zhuoer Feng, Bosi Wen, Yidong Wang, Lucen Zhong, Jiamin Ren, Linfeng Zhang, Jie Tang · March 7, 2026 · 1 min read · 16 views

#cs.AI #cs.CL

arXiv:2603.00623v1 Announce Type: new Abstract: Agentic systems augment large language models with external tools and iterative decision making, enabling complex tasks such as deep research, function calling, and coding. However, their long and intricate execution traces make failure diagnosis and root cause analysis extremely challenging. Manual inspection does not scale, while directly applying LLMs to raw traces is hindered by input length limits and unreliable reasoning. Focusing solely on final task outcomes further discards critical behavioral information required for accurate issue localization. To address these issues, we propose TraceSIR, a multi-agent framework for structured analysis and reporting of agentic execution traces. TraceSIR coordinates three specialized agents: (1) StructureAgent, which introduces a novel abstraction format, TraceFormat, to compress execution traces while preserving essential behavioral information; (2) InsightAgent, which performs fine-grained diagnosis including issue localization, root cause analysis, and optimization suggestions; (3) ReportAgent, which aggregates insights across task instances and generates comprehensive analysis reports. To evaluate TraceSIR, we construct TraceBench, covering three real-world agentic scenarios, and introduce ReportEval, an evaluation protocol for assessing the quality and usability of analysis reports aligned with industry needs. Experiments show that TraceSIR consistently produces coherent, informative, and actionable reports, significantly outperforming existing approaches across all evaluation dimensions. Our project and video are publicly available at https://github.com/SHU-XUN/TraceSIR.

Executive Summary

This article introduces TraceSIR, a multi-agent framework designed to analyze and report on the complex execution traces of agentic systems. The framework consists of three specialized agents: StructureAgent, InsightAgent, and ReportAgent, which work together to compress execution traces, perform fine-grained diagnosis, and generate comprehensive analysis reports. The authors evaluate TraceSIR using the TraceBench dataset and ReportEval protocol, demonstrating its effectiveness in producing coherent, informative, and actionable reports. The article addresses significant challenges in failure diagnosis and root cause analysis for agentic systems, providing a novel solution that outperforms existing approaches. While the article makes a valuable contribution to the field, its focus on a specific application may limit its broader impact.

Key Points

▸ TraceSIR is a multi-agent framework for analyzing and reporting agentic execution traces.
▸ The framework consists of three specialized agents: StructureAgent, InsightAgent, and ReportAgent.
▸ TraceSIR demonstrates significant improvements over existing approaches in failure diagnosis and root cause analysis.

Merits

Novel Solution

TraceSIR addresses significant challenges in failure diagnosis and root cause analysis for agentic systems, providing a novel solution that outperforms existing approaches.

Comprehensive Evaluation

The authors evaluate TraceSIR using the TraceBench dataset and ReportEval protocol, demonstrating its effectiveness in producing coherent, informative, and actionable reports.

Demerits

Limited Broader Impact

The article's focus on a specific application may limit its broader impact and applicability to other fields.

Dependence on Dataset

The effectiveness of TraceSIR may be dependent on the quality and representativeness of the TraceBench dataset.

Expert Commentary

This article makes a substantial contribution to the field of agentic systems, addressing significant challenges in failure diagnosis and root cause analysis. The development of TraceSIR provides a novel solution that outperforms existing approaches, demonstrating the potential for improved decision-making and analysis. However, the article's focus on a specific application may limit its broader impact, and the effectiveness of TraceSIR may depend on the quality and representativeness of the TraceBench dataset. Nevertheless, this research has significant practical and policy implications, highlighting the need for more robust failure diagnosis and root cause analysis mechanisms in industries that rely on agentic systems.

Recommendations

✓ Future research should focus on expanding the applicability of TraceSIR to other domains and applications.
✓ The development of more robust and representative datasets, such as the TraceBench dataset, is crucial for evaluating the effectiveness of agentic systems and their analysis tools.

Sources

arXiv - cs.AI

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

AI Commentary

Executive Summary

Key Points

Merits

Novel Solution

Comprehensive Evaluation

Demerits

Limited Broader Impact

Dependence on Dataset

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs