Academic

From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems

arXiv:2602.23701v1 Announce Type: new Abstract: LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechanisms. Existing failure attribution methods, whether relying on direct prompting, costly replays, or supervised fine-tuning, typically treat execution logs as flat sequences. This linear perspective fails to disentangle the intricate causal links inherent to MAS, leading to weak observability and ambiguous responsibility boundaries. To address these challenges, we propose CHIEF, a novel framework that transforms chaotic trajectories into a structured hierarchical causal graph. It then employs hierarchical oracle-guided backtracking to efficiently prune the search space via sybthesized virtual oracles. Finally, it implements counterfactual attribution via a progressive causal screening strategy to rigorously distinguish true root causes from propagated symptoms. Experiments on Who&Wh

Yawen Wang, Wenjie Wu, Junjie Wang, Qing Wang · March 7, 2026 · 1 min read · 34 views

#cs.AI #cs.SE

Executive Summary

This article presents CHIEF, a novel framework for hierarchical failure attribution in Large Language Model (LLM)-based Multi-Agent Systems (MAS). CHIEF addresses the limitations of existing methods by transforming chaotic execution logs into structured causal graphs, employing hierarchical oracle-guided backtracking, and implementing counterfactual attribution via progressive causal screening. Experiments demonstrate CHIEF's superiority over state-of-the-art baselines on both agent- and step-level accuracy. The framework's modules are critical to its performance, as confirmed by ablation studies. CHIEF has significant implications for the development and deployment of LLM-based MAS, enabling more robust and transparent failure analysis.

Key Points

▸ CHIEF transforms chaotic execution logs into structured causal graphs
▸ Hierarchical oracle-guided backtracking efficiently prunes the search space
▸ Counterfactual attribution via progressive causal screening rigorously distinguishes true root causes

Merits

Strength in Addressing Complexity

CHIEF effectively addresses the complexity of LLM-based MAS by incorporating hierarchical causal graph analysis and counterfactual attribution, making it a significant improvement over existing methods.

Improved Accuracy

CHIEF demonstrates superior performance over state-of-the-art baselines on both agent- and step-level accuracy, indicating its practical utility in real-world applications.

Enhanced Transparency

CHIEF's hierarchical failure attribution enables more transparent analysis of LLM-based MAS, facilitating better understanding and improvement of these complex systems.

Demerits

Potential Computational Burden

The hierarchical oracle-guided backtracking and counterfactual attribution components of CHIEF may introduce computational overhead, potentially limiting its scalability in large-scale applications.

Dependence on High-Quality Training Data

The performance of CHIEF may be sensitive to the quality of training data, which could impact the generalizability and robustness of the framework in real-world scenarios.

Expert Commentary

CHIEF represents a significant advancement in the field of LLM-based MAS, addressing critical limitations of existing methods. The framework's novel approach to hierarchical failure attribution and counterfactual attribution has the potential to transform the way we design and deploy these complex systems. However, it is essential to carefully consider the potential computational burden and dependence on high-quality training data, as these factors may impact the framework's scalability and generalizability. As CHIEF continues to evolve, it is likely to have far-reaching implications for the development of explainable AI systems, causal reasoning in complex systems, and the responsible deployment of LLM-based MAS.

Recommendations

✓ Further research is needed to evaluate the scalability and generalizability of CHIEF in large-scale applications and diverse domains.
✓ The development of tools and methodologies for interpreting and visualizing the causal graphs generated by CHIEF would facilitate its practical adoption and enhance its transparency.

Sources

arXiv - cs.AI

From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Complexity

Improved Accuracy

Enhanced Transparency

Demerits

Potential Computational Burden

Dependence on High-Quality Training Data

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs