Academic

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Swapnil Parekh · March 4, 2026 · 1 min read · 10 views

#cs.CL #cs.AI #cs.LG

arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem over these analytic degrees of freedom. Our method, CIRCUS, constructs an ensemble of attribution graphs by pruning a single raw attribution run under multiple configurations, assigns each edge a stability score (the fraction of configurations that retain it), and extracts a strict-consensus circuit consisting only of edges that appear in all views. This produces a threshold-robust "core" circuit while explicitly surfacing contingent alternatives and enabling rejection of low-agreement structure. CIRCUS requires no retraining and adds negligible overhead, since it aggregates structure across already-computed pruned graphs. On Gemma-2-2B and Llama-3.2-1B, strict consensus circuits are ~40x smaller than the union of all configurations while retaining comparable influence-flow explanatory power, and they outperform a same-edge-budget baseline (union pruned to match the consensus size). We further validate causal relevance with activation patching, where consensus-identified nodes consistently beat matched non-consensus controls (p=0.0004). Overall, CIRCUS provides a practical, uncertainty-aware framework for reporting trustworthy, auditable mechanistic circuits with an explicit core/contingent/noise decomposition.

Executive Summary

The article introduces CIRCUS, a method for circuit discovery that addresses the issue of uncertainty in mechanistic explanations. CIRCUS constructs an ensemble of attribution graphs and extracts a strict-consensus circuit, providing a threshold-robust 'core' circuit and surfacing contingent alternatives. This approach enables the rejection of low-agreement structure and provides a practical, uncertainty-aware framework for reporting trustworthy mechanistic circuits. The method is validated on Gemma-2-2B and Llama-3.2-1B models, demonstrating its effectiveness in retaining explanatory power while reducing circuit size.

Key Points

▸ CIRCUS addresses uncertainty in mechanistic explanations
▸ The method constructs an ensemble of attribution graphs
▸ A strict-consensus circuit is extracted to provide a threshold-robust 'core' circuit

Merits

Robustness to Uncertainty

CIRCUS provides a robust framework for circuit discovery, addressing the issue of uncertainty in mechanistic explanations

Efficient Computation

The method requires no retraining and adds negligible overhead, making it a practical solution

Demerits

Limited Applicability

The method may not be applicable to all types of models or datasets, requiring further validation

Interpretability Challenges

The strict-consensus circuit may not always be easily interpretable, requiring additional analysis

Expert Commentary

The introduction of CIRCUS marks a significant step forward in addressing the challenge of uncertainty in mechanistic explanations. By providing a robust and efficient framework for circuit discovery, CIRCUS has the potential to improve the transparency and trustworthiness of AI models. However, further research is needed to fully explore the applicability and interpretability of the method. The use of CIRCUS in various domains and its potential impact on regulatory standards will be an exciting area of study in the coming years.

Recommendations

✓ Further validation of CIRCUS on diverse models and datasets
✓ Investigation of the method's applicability to other explainability techniques, such as feature importance and model interpretability

Sources

arXiv - cs.CL

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

AI Commentary

Executive Summary

Key Points

Merits

Robustness to Uncertainty

Efficient Computation

Demerits

Limited Applicability

Interpretability Challenges

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs