Automating Agent Hijacking via Structural Template Injection
arXiv:2602.16958v1 Announce Type: new Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transfera
arXiv:2602.16958v1 Announce Type: new Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.
Executive Summary
This study proposes Phantom, an automated agent hijacking framework that exploits the architectural mechanisms of Large Language Models (LLMs) to manipulate execution. By injecting optimized structured templates into the retrieved context, Phantom induces role confusion and causes the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. The framework is demonstrated to significantly outperform existing baselines in both Attack Success Rate (ASR) and query efficiency. Furthermore, the study identifies over 70 vulnerabilities in real-world commercial products, underscoring the practical severity of structured template-based hijacking. The authors' framework and empirical findings provide an important contribution to the field of LLM security, highlighting the need for securing next-generation agentic systems.
Key Points
- ▸ Phantom is an automated agent hijacking framework that targets the fundamental architectural mechanisms of LLMs
- ▸ The framework exploits the use of specific chat template tokens to induce role confusion and manipulate execution
- ▸ Phantom outperforms existing baselines in both ASR and query efficiency, and identifies over 70 vulnerabilities in commercial products
Merits
Strength in Methodology
The authors' use of a structured template injection approach to automate agent hijacking is a significant methodological strength, as it allows for efficient and scalable attacks.
Practical Implications
The study's empirical findings, including the identification of over 70 vulnerabilities in commercial products, provide a clear illustration of the practical severity of structured template-based hijacking.
Demerits
Limited Transferability
The study's focus on specific chat template tokens may limit the transferability of the Phantom framework to other LLM architectures or implementations.
Potential for Overfitting
The use of Bayesian optimization to identify optimal adversarial vectors may lead to overfitting, particularly if the training dataset is limited or biased.
Expert Commentary
The study's contribution to the field of LLM security is significant, as it highlights the potential for structured template-based hijacking and provides a framework for automating these attacks. However, the study's limitations, including the potential for overfitting and limited transferability, must be carefully considered in the development and deployment of LLMs. Furthermore, the study's practical implications, including the identification of over 70 vulnerabilities in commercial products, underscore the need for more robust and secure architectures. As the field of LLM security continues to evolve, it is essential to prioritize the development of more secure and trustworthy systems.
Recommendations
- ✓ Developers and organizations should prioritize the security of LLM-based systems, including the implementation of robust security standards and regulations.
- ✓ Researchers should continue to investigate the potential for structured template-based hijacking and develop more robust and secure architectures for LLMs.