Academic

CODESTRUCT: Code Agents over Structured Action Spaces

arXiv:2604.05407v1 Announce Type: new Abstract: LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode for applying syntax-validated transformations to semantic program elements. Evaluated on SWE-Bench Verified across six LLMs, CODESTRUCT improves Pass@1 accuracy by 1.2-5.0% while reducing token consumption by 12-38% for most models. Models that frequently fail to produce valid patches under text-based interfaces benefit most: GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%. On CodeAssistBench, we observe consistent accuracy gains (+0.8-4.4%) with cost reductions up to 33%. Our results show that structure-aware

Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan · April 8, 2026 · 1 min read · 37 views

#cs.AI #cs.SE

Executive Summary

CODESTRUCT introduces a paradigm shift in LLM-based code agents by transitioning from unstructured text manipulation to structured action spaces grounded in Abstract Syntax Trees (ASTs). The framework replaces brittle string-matching edits with syntax-validated transformations on named AST entities, enabling more reliable repository interactions. Empirical evaluation on SWE-Bench Verified and CodeAssistBench demonstrates consistent improvements in Pass@1 accuracy (1.2–5.0% for most models, up to 20.8% for GPT-5-nano) alongside substantial token efficiency gains (12–38% reductions). These results underscore the superiority of structure-aware interfaces in mitigating failures such as empty patches and formatting drift, offering a robust foundation for code agent architectures.

Key Points

▸ Reframing the codebase as a structured action space using AST entities to avoid text-based brittleness
▸ Introduction of readCode and editCode operations for syntax-validated, semantic program transformations
▸ Empirical validation showing consistent accuracy improvements and token efficiency across multiple LLMs and benchmarks

Merits

Novelty

CODESTRUCT pioneers structured action spaces for code agents, addressing core limitations of text-based interfaces such as formatting drift and ambiguous pattern matching.

Empirical Rigor

Comprehensive evaluation across SWE-Bench Verified and CodeAssistBench with multiple LLMs demonstrates consistent and measurable improvements in performance and efficiency.

Theoretical Soundness

Leveraging ASTs ensures syntax validity and semantic precision, aligning with established software engineering principles.

Scalability

Token efficiency gains (12–38%) suggest potential for broader deployment in resource-constrained environments.

Demerits

Implementation Complexity

Integrating AST-based frameworks into existing LLM pipelines may require significant architectural changes and tooling investments.

Scope Limitations

Performance gains are benchmark-specific; applicability to non-code or mixed-content repositories remains untested.

Dependency Risks

Accuracy and efficiency gains depend on robust AST parsing, which may falter in languages with non-standard or obfuscated syntax.

Expert Commentary

CODESTRUCT represents a significant advancement in the design of LLM-based code agents by addressing a critical pain point: the fragility of text-based edit mechanisms. The shift to AST-driven action spaces reflects a broader trend in AI systems toward greater structure and semantic grounding, paralleling developments in symbolic AI and program synthesis. The empirical results are compelling, particularly the dramatic reduction in empty-patch failures for GPT-5-nano, which suggests that structure-aware interfaces can mitigate some of the most costly failure modes in autonomous code modification. However, the framework’s dependence on robust AST parsing introduces a non-trivial integration challenge, particularly for languages with idiosyncratic syntax or minimal tooling support. Future work should explore hybrid approaches that combine the precision of AST-based edits with the flexibility of text-based interfaces, ensuring broader applicability without sacrificing reliability. This work also raises important questions about the scalability of such systems in large, heterogeneous codebases where dependency analysis and cross-file edits become increasingly complex.

Recommendations

✓ Develop standardized AST interfaces and tooling kits to facilitate broader adoption across programming languages and development environments
✓ Explore hybrid models that integrate text-based fallback mechanisms for edge cases where AST parsing is unreliable or incomplete
✓ Conduct longitudinal studies to assess long-term impact on developer workflows, including potential shifts in debugging practices and collaboration dynamics
✓ Engage with language standards bodies to incorporate AST-aware features into language specifications, enhancing interoperability with autonomous agents

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

CODESTRUCT: Code Agents over Structured Action Spaces

AI Commentary

Executive Summary

Key Points

Merits

Novelty

Empirical Rigor

Theoretical Soundness

Scalability

Demerits

Implementation Complexity

Scope Limitations

Dependency Risks

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs