Academic

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed environments, and the inability of closed systems to improve from failures. We present AOI (Autonomous Operations Intelligence), a trainable multi-agent framework formulating automated operations as a structured trajectory learning problem under security constraints. Our approach integrates three key components. First, a trainable diagnostic system applies Group Relative Policy Optimization (GRPO) to distill expert-level knowledge into locally deployed open-source models, enabling preference-based learning without exposing sensitive data. Second, a read-write separated execution architecture decomposes operational trajectories into observation, reasoning, and act

arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed environments, and the inability of closed systems to improve from failures. We present AOI (Autonomous Operations Intelligence), a trainable multi-agent framework formulating automated operations as a structured trajectory learning problem under security constraints. Our approach integrates three key components. First, a trainable diagnostic system applies Group Relative Policy Optimization (GRPO) to distill expert-level knowledge into locally deployed open-source models, enabling preference-based learning without exposing sensitive data. Second, a read-write separated execution architecture decomposes operational trajectories into observation, reasoning, and action phases, allowing safe learning while preventing unauthorized state mutation. Third, a Failure Trajectory Closed-Loop Evolver mines unsuccessful trajectories and converts them into corrective supervision signals, enabling continual data augmentation. Evaluated on the AIOpsLab benchmark, our contributions yield cumulative gains. (1) The AOI runtime alone achieves 66.3% best@5 success on all 86 tasks, outperforming the prior state-of-the-art (41.9%) by 24.4 points. (2) Adding Observer GRPO training, a locally deployed 14B model reaches 42.9% avg@1 on 63 held-out tasks with unseen fault types, surpassing Claude Sonnet 4.5. (3) The Evolver converts 37 failed trajectories into diagnostic guidance, improving end-to-end avg@5 by 4.8 points while reducing variance by 35%.

Executive Summary

The article introduces AOI, a novel multi-agent framework that addresses critical barriers to LLM agent deployment in SRE by transforming failed diagnostic trajectories into actionable training signals. AOI integrates GRPO for secure knowledge distillation, a read-write separation architecture to mitigate safety risks, and an Evolver to convert failed trajectories into corrective supervision. Empirical results on the AIOpsLab benchmark demonstrate significant advancements: 66.3% best@5 success rate (outperforming prior 41.9%), improved performance on unseen fault types via Observer GRPO, and enhanced end-to-end metrics via the Evolver. These findings represent a pivotal shift in autonomous SRE via data augmentation through failure analysis.

Key Points

  • AOI introduces GRPO for secure, preference-based learning without data exposure
  • The read-write separation architecture enables safe learning under permission constraints
  • The Failure Trajectory Evolver converts failed trajectories into corrective supervision, augmenting training data

Merits

Innovative Framework

AOI uniquely combines secure learning, architectural separation, and failure-to-signal conversion—a novel triad addressing key enterprise constraints simultaneously.

Empirical Validation

Quantifiable gains across multiple metrics (success rate, avg@1 performance, variance reduction) validate the practical impact of the proposed components.

Demerits

Constraint Dependency

Effectiveness is contingent upon the availability of sufficient failed trajectory data; in sparse failure environments, the Evolver’s impact may be diminished.

Scalability Concern

Integrating GRPO with local 14B models may introduce latency or cost barriers for large-scale, multi-region deployments.

Expert Commentary

AOI represents a sophisticated synthesis of algorithmic innovation and operational pragmatism. The integration of GRPO as a secure distillation mechanism is particularly noteworthy—it elegantly circumvents the data access bottleneck without compromising model fidelity. The read-write separation architecture, while conceptually familiar, is implemented with surgical precision to align with enterprise permission models, avoiding the common pitfall of unintended state mutation. The Evolver’s conversion of failure into supervision is a masterstroke in adaptive learning: it transforms a traditionally discarded resource—failed runs—into a systematic, iterative improvement engine. This is not merely an improvement; it is a paradigm shift in how autonomous agents learn under constraint. The results on AIOpsLab are impressive, but the underlying methodology—particularly the closed-loop feedback from failure—has broader applicability across domains where autonomous systems operate under uncertainty and restricted access. This work sets a new benchmark for responsible, adaptive autonomy in SRE.

Recommendations

  • 1. Extend AOI’s architecture to support federated learning across multi-tenant cloud environments to enhance generalization.
  • 2. Explore hybrid GRPO-RL variants to further accelerate convergence in high-noise operational settings.

Sources