AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serv
arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serve as a valuable benchmark for tracking progress on securing LLM agents in practical settings. The benchmark is publicly available at https://tanqiujiang.github.io/AgentLAB_main.
Executive Summary
The article introduces AgentLAB, a benchmark for evaluating the susceptibility of Large Language Model (LLM) agents to long-horizon attacks. AgentLAB includes five novel attack types and 28 realistic environments, and is used to assess the vulnerabilities of representative LLM agents. The results show that LLM agents are highly susceptible to long-horizon attacks, and defenses designed for single-turn interactions are ineffective. The benchmark is publicly available and is expected to facilitate progress in securing LLM agents in practical settings.
Key Points
- ▸ Introduction of AgentLAB benchmark
- ▸ Evaluation of LLM agent vulnerabilities to long-horizon attacks
- ▸ Ineffectiveness of single-turn interaction defenses
Merits
Comprehensive Evaluation
AgentLAB provides a thorough assessment of LLM agent vulnerabilities to long-horizon attacks
Demerits
Limited Generalizability
The benchmark may not capture all possible long-horizon attack scenarios
Expert Commentary
The introduction of AgentLAB is a significant step forward in evaluating the security of LLM agents in complex environments. The benchmark's comprehensive evaluation of LLM agent vulnerabilities to long-horizon attacks highlights the need for more effective defenses. Furthermore, the ineffectiveness of single-turn interaction defenses underscores the importance of considering multi-turn interactions in the development of secure LLM agents. As the use of LLM agents continues to expand, the development of robust benchmarks like AgentLAB will be crucial in ensuring their security and reliability.
Recommendations
- ✓ Develop more effective defenses against long-horizon attacks
- ✓ Establish regulatory frameworks for the development and deployment of LLM agents