Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
arXiv:2604.05333v1 Announce Type: new Abstract: Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average r
arXiv:2604.05333v1 Announce Type: new Abstract: Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.
Executive Summary
The article introduces Graph of Skills (GoS), a novel inference-time structural retrieval framework designed to optimize large-scale skill libraries in agent systems. By constructing an offline executable skill graph and employing hybrid retrieval techniques—hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration—GoS addresses scalability challenges in modern agent systems. Empirical evaluations on SkillsBench and ALFWorld demonstrate a 43.6% improvement in average reward and a 37.8% reduction in input tokens compared to the vanilla full skill-loading baseline, while maintaining compatibility across three model families (Claude Sonnet, GPT-5.2 Codex, MiniMax). The study further validates GoS’s efficiency through ablation studies involving skill libraries ranging from 200 to 2,000 skills, consistently outperforming both full skill loading and simple vector retrieval methods.
Key Points
- ▸ GoS addresses scalability challenges in agent systems by optimizing large skill libraries through structural retrieval rather than brute-force loading.
- ▸ Hybrid retrieval methods—semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration—enable dependency-aware skill bundling, reducing input token usage and hallucinations while improving task performance.
- ▸ Empirical validation demonstrates GoS’s superiority in balancing reward, token efficiency, and runtime across diverse skill libraries and model families.
- ▸ The offline construction of an executable skill graph ensures computational feasibility and mitigates real-time bottlenecks in inference.
Merits
Innovative Structural Retrieval Framework
GoS introduces a dependency-aware graph-based retrieval mechanism that intelligently bundles skills based on contextual relevance and dependencies, significantly improving efficiency and performance over traditional vector retrieval or full skill loading.
Empirical Robustness and Generalizability
The framework demonstrates consistent improvements in reward and token efficiency across multiple benchmarks (SkillsBench, ALFWorld) and model families (Claude Sonnet, GPT-5.2 Codex, MiniMax), suggesting broad applicability in diverse agent systems.
Mitigation of Context Window Saturation
By reducing input token usage by 37.8% while maintaining or improving task performance, GoS effectively addresses the critical challenge of context window saturation in large-scale agent systems.
Demerits
Offline Dependency on Graph Construction
The reliance on offline construction of the skill graph may limit adaptability in dynamic environments where skill requirements change frequently, potentially necessitating periodic retraining or updates.
Complexity of Hybrid Retrieval Methods
The integration of semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration introduces computational complexity that may pose challenges for real-time deployment in resource-constrained systems.
Dependency on Skill Library Quality
The performance of GoS is inherently tied to the quality and structure of the underlying skill library; poorly curated or sparsely connected skill graphs may undermine the framework’s effectiveness.
Expert Commentary
The Graph of Skills framework represents a significant advancement in the optimization of large-scale agent systems, addressing a critical bottleneck in the deployment of sophisticated autonomous agents. By leveraging a dependency-aware graph structure and hybrid retrieval methods, GoS not only enhances computational efficiency but also improves task performance, as evidenced by the substantial improvements in average reward and token reduction. This work is particularly timely given the growing complexity of agent systems and the increasing demand for real-time, context-aware decision-making. However, the framework’s reliance on offline graph construction and the complexity of its retrieval mechanisms may pose challenges for deployment in highly dynamic environments. Future research should explore adaptive graph updating mechanisms and streamlined retrieval algorithms to enhance real-time applicability. Additionally, the ethical and governance implications of scalable agent systems warrant careful consideration, as the proliferation of such systems could have profound societal impacts. Overall, GoS sets a new benchmark for efficiency and performance in agent systems, with far-reaching implications for both academia and industry.
Recommendations
- ✓ Develop adaptive mechanisms for real-time graph updates to enhance GoS’s applicability in dynamic environments where skill requirements evolve rapidly.
- ✓ Conduct further research into simplifying the hybrid retrieval pipeline to reduce computational overhead, enabling broader adoption in resource-constrained systems.
- ✓ Establish standardized benchmarks and evaluation protocols for agent systems with large skill libraries to facilitate comparative analysis and drive continuous improvement in the field.
- ✓ Engage policymakers and ethicists to proactively address the societal implications of scalable agent systems, ensuring that innovation is balanced with responsible governance.
Sources
Original: arXiv - cs.AI