EvoSkill: Automated Skill Discovery for Multi-Agent Systems
arXiv:2603.02766v1 Announce Type: new Abstract: Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this through \textit{agent skills}: reusable workflows, and code, that augment agents with domain-specific capabilities. Most skills today are hand-crafted, and existing evolutionary approaches optimize low-level artifacts (e.g. prompts \& code) that are tightly coupled to specific models and tasks. We introduce \textbf{EvoSkill}, a self-evolving framework that automatically discovers and refines agent skills through iterative failure analysis. EvoSkill analyzes execution failures, proposes new skills or edits to existing ones, and materializes them into structured, reusable skill folders. A Pareto frontier of agent programs governs selection, retaining only skills that improve held-out validation performance while the underlying model remains froz
arXiv:2603.02766v1 Announce Type: new Abstract: Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this through \textit{agent skills}: reusable workflows, and code, that augment agents with domain-specific capabilities. Most skills today are hand-crafted, and existing evolutionary approaches optimize low-level artifacts (e.g. prompts \& code) that are tightly coupled to specific models and tasks. We introduce \textbf{EvoSkill}, a self-evolving framework that automatically discovers and refines agent skills through iterative failure analysis. EvoSkill analyzes execution failures, proposes new skills or edits to existing ones, and materializes them into structured, reusable skill folders. A Pareto frontier of agent programs governs selection, retaining only skills that improve held-out validation performance while the underlying model remains frozen. We evaluate EvoSkill on two benchmarks: OfficeQA, a grounded reasoning benchmark over U.S.\ Treasury data, where it improves exact-match accuracy by \textbf{7.3\%} (60.6\% $\to$ 67.9\%); and SealQA, a search-augmented QA benchmark with noisy retrieval, where it yields a \textbf{12.1\%} gain (26.6\% $\to$ 38.7\%). We also investigate the zero-shot transfer capabilties of skills evolved on one task to the other; in particular: skills evolved from SealQA transfers zero-shot to BrowseComp, improving accuracy by \textbf{5.3\%} without modification demonstrating that skill-level optimization produces transferable capabilities beyond the training task.
Executive Summary
This article presents EvoSkill, a self-evolving framework that automatically discovers and refines agent skills for multi-agent systems. EvoSkill analyzes execution failures, proposes new skills or edits to existing ones, and materializes them into reusable skill folders. The framework is evaluated on two benchmarks, OfficeQA and SealQA, showing significant improvements in accuracy. Additionally, EvoSkill demonstrates zero-shot transfer capabilities, where skills evolved on one task can be transferred to another task. This research has the potential to revolutionize the field of multi-agent systems by providing a more efficient and effective way to develop domain expertise. The authors' approach to skill-level optimization and transfer learning is a significant contribution to the field.
Key Points
- ▸ EvoSkill is a self-evolving framework that automatically discovers and refines agent skills
- ▸ EvoSkill analyzes execution failures to propose new skills or edits to existing ones
- ▸ EvoSkill demonstrates significant improvements in accuracy on two benchmarks, OfficeQA and SealQA
- ▸ EvoSkill shows zero-shot transfer capabilities, where skills evolved on one task can be transferred to another task
Merits
Strength in Addressing Domain Expertise
EvoSkill addresses the limitation of current approaches that rely on hand-crafted skills and optimize low-level artifacts. The framework provides a more efficient and effective way to develop domain expertise for multi-agent systems.
Significant Improvements in Accuracy
EvoSkill demonstrates significant improvements in accuracy on two benchmarks, OfficeQA and SealQA, showing the effectiveness of the framework in improving agent performance.
Zero-Shot Transfer Capabilities
EvoSkill's ability to transfer skills evolved on one task to another task has the potential to revolutionize the field of multi-agent systems by providing a more efficient and effective way to develop domain expertise.
Demerits
Limited Evaluation on Real-World Tasks
The evaluation of EvoSkill is limited to two benchmarks, OfficeQA and SealQA, and it is unclear how the framework would perform on real-world tasks.
Dependence on Model Quality
EvoSkill's performance is dependent on the quality of the underlying model, which may limit its effectiveness in situations where the model is not well-suited for the task.
Expert Commentary
The article presents a significant contribution to the field of multi-agent systems by introducing EvoSkill, a self-evolving framework that automatically discovers and refines agent skills. The framework's ability to analyze execution failures, propose new skills or edits to existing ones, and materialize them into reusable skill folders is a significant improvement over current approaches. The evaluation of EvoSkill on two benchmarks, OfficeQA and SealQA, shows significant improvements in accuracy, and the demonstration of zero-shot transfer capabilities is a major breakthrough. However, the article's limitations, such as the dependence on model quality and limited evaluation on real-world tasks, need to be addressed in future research. Overall, EvoSkill has the potential to revolutionize the field of multi-agent systems by providing a more efficient and effective way to develop domain expertise.
Recommendations
- ✓ Further evaluation of EvoSkill on real-world tasks is needed to assess its effectiveness in real-world scenarios.
- ✓ Investigation into the dependence of EvoSkill on model quality is necessary to understand its limitations and potential applications.