Academic

A Multi-Agent Framework for Code-Guided, Modular, and Verifiable Automated Machine Learning

Dat Le, Duc-Cuong Le, Anh-Son Nguyen, Tuan-Dung Bui, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo · February 18, 2026 · 1 min read · 6 views

#cs.LG #cs.SE

arXiv:2602.13937v1 Announce Type: new Abstract: Automated Machine Learning (AutoML) has revolutionized the development of data-driven solutions; however, traditional frameworks often function as "black boxes", lacking the flexibility and transparency required for complex, real-world engineering tasks. Recent Large Language Model (LLM)-based agents have shifted toward code-driven approaches. However, they frequently suffer from hallucinated logic and logic entanglement, where monolithic code generation leads to unrecoverable runtime failures. In this paper, we present iML, a novel multi-agent framework designed to shift AutoML from black-box prompting to a code-guided, modular, and verifiable architectural paradigm. iML introduces three main ideas: (1) Code-Guided Planning, which synthesizes a strategic blueprint grounded in autonomous empirical profiling to eliminate hallucination; (2) Code-Modular Implementation, which decouples preprocessing and modeling into specialized components governed by strict interface contracts; and (3) Code-Verifiable Integration, which enforces physical feasibility through dynamic contract verification and iterative self-correction. We evaluate iML across MLE-BENCH and the newly introduced iML-BENCH, comprising a diverse range of real-world Kaggle competitions. The experimental results show iML's superiority over state-of-the-art agents, achieving a valid submission rate of 85% and a competitive medal rate of 45% on MLE-BENCH, with an average standardized performance score (APS) of 0.77. On iML-BENCH, iML significantly outperforms the other approaches by 38%-163% in APS. Furthermore, iML maintains a robust 70% success rate even under stripped task descriptions, effectively filling information gaps through empirical profiling. These results highlight iML's potential to bridge the gap between stochastic generation and reliable engineering, marking a meaningful step toward truly AutoML.

Executive Summary

The article introduces iML, a multi-agent framework designed to transform Automated Machine Learning (AutoML) from a black-box approach to a code-guided, modular, and verifiable paradigm. iML addresses key challenges in traditional AutoML frameworks, such as hallucinated logic and logic entanglement, by introducing three main components: Code-Guided Planning, Code-Modular Implementation, and Code-Verifiable Integration. The framework is evaluated on MLE-BENCH and a new benchmark, iML-BENCH, demonstrating superior performance in terms of valid submission rates, medal rates, and average standardized performance scores. iML's ability to maintain robustness even with stripped task descriptions underscores its potential to bridge the gap between stochastic generation and reliable engineering, marking a significant advancement in AutoML.

Key Points

▸ Introduction of iML, a multi-agent framework for AutoML.
▸ Three main components: Code-Guided Planning, Code-Modular Implementation, and Code-Verifiable Integration.
▸ Evaluation on MLE-BENCH and iML-BENCH, showing superior performance metrics.
▸ Robustness under stripped task descriptions, highlighting empirical profiling capabilities.

Merits

Innovative Framework

iML presents a novel approach to AutoML by shifting from black-box prompting to a code-guided, modular, and verifiable paradigm, addressing critical issues in traditional AutoML frameworks.

Comprehensive Evaluation

The framework is rigorously evaluated on diverse benchmarks, demonstrating significant improvements in performance metrics and robustness.

Practical Applicability

iML's ability to handle real-world Kaggle competitions and maintain performance under stripped task descriptions highlights its practical applicability and reliability.

Demerits

Complexity

The multi-agent framework and its components introduce complexity that may require significant computational resources and expertise to implement effectively.

Benchmark Limitations

While the evaluation on MLE-BENCH and iML-BENCH is comprehensive, the results may not fully generalize to all real-world scenarios, potentially limiting the framework's broader applicability.

Empirical Profiling Dependence

The framework's reliance on empirical profiling for filling information gaps may not always be feasible or accurate, depending on the nature and quality of the data available.

Expert Commentary

The article presents a significant advancement in the field of AutoML by introducing iML, a multi-agent framework that addresses critical challenges in traditional AutoML approaches. The framework's emphasis on code-guided planning, modular implementation, and verifiable integration represents a paradigm shift towards more reliable and transparent AutoML solutions. The rigorous evaluation on diverse benchmarks, including the newly introduced iML-BENCH, demonstrates the framework's superior performance and robustness. However, the complexity of the framework and its reliance on empirical profiling present potential limitations that need to be addressed. The article's findings have significant implications for both practical applications and policy decisions, highlighting the need for further research and development in the field of AutoML. Overall, iML marks a meaningful step towards truly automated and reliable machine learning solutions.

Recommendations

✓ Further research to address the complexity and computational requirements of the iML framework.
✓ Expansion of evaluation benchmarks to include a broader range of real-world scenarios to assess the framework's generalizability.
✓ Exploration of alternative methods to empirical profiling to enhance the framework's robustness and reliability in diverse contexts.

Sources

arXiv - cs.LG

Something extraordinary is coming.

A Multi-Agent Framework for Code-Guided, Modular, and Verifiable Automated Machine Learning

AI Commentary

Executive Summary

Key Points

Merits

Innovative Framework

Comprehensive Evaluation

Practical Applicability

Demerits

Complexity

Benchmark Limitations

Empirical Profiling Dependence

Expert Commentary

Recommendations

Sources

Related Articles

How Large Language Models Get Stuck: Early structure with persistent …

Distribution-Aware Companding Quantization of Large Language Models

Policy Compliance of User Requests in Natural Language for AI …

LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

JCG, PC

HSOLLC Co., Ltd.