Academic

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

Zihui Chen, Yuling Wang, Pengfei Jiao, Kai Wu, Xiao Wang, Xiang Ao, Dalin Zhang · March 24, 2026 · 1 min read · 10 views

#cs.AI

arXiv:2603.21155v1 Announce Type: new Abstract: Text-attributed graphs (TAGs) enhance graph learning by integrating rich textual semantics and topological context for each node. While boosting expressiveness, they also expose new vulnerabilities in graph learning through text-based adversarial surfaces. Recent advances leverage diverse backbones, such as graph neural networks (GNNs) and pre-trained language models (PLMs), to capture both structural and textual information in TAGs. This diversity raises a key question: How can we design universal adversarial attacks that generalize across architectures to assess the security of TAG models? The challenge arises from the stark contrast in how different backbones-GNNs and PLMs-perceive and encode graph patterns, coupled with the fact that many PLMs are only accessible via APIs, limiting attacks to black-box settings. To address this, we propose BadGraph, a novel attack framework that deeply elicits large language models (LLMs) understanding of general graph knowledge to jointly perturb both node topology and textual semantics. Specifically, we design a target influencer retrieval module that leverages graph priors to construct cross-modally aligned attack shortcuts, thereby enabling efficient LLM-based perturbation reasoning. Experiments show that BadGraph achieves universal and effective attacks across GNN- and LLM-based reasoners, with up to a 76.3% performance drop, while theoretical and empirical analyses confirm its stealthy yet interpretable nature.

Executive Summary

This article proposes a novel attack framework, BadGraph, to design universal adversarial attacks on text-attributed graphs (TAGs), which combine structural and textual information. By leveraging large language models (LLMs) and graph priors, BadGraph enables efficient perturbation reasoning, leading to effective attacks on both graph neural networks (GNNs) and pre-trained language models (PLMs) with up to a 76.3% performance drop. The framework's stealthy yet interpretable nature is confirmed through theoretical and empirical analyses. This research raises crucial questions about the security of TAG models and highlights the need for robust defense mechanisms against such attacks. The proposed approach has significant implications for the development of secure and trustworthy AI systems, particularly in applications where graph-based data plays a critical role, such as social network analysis, recommendation systems, and natural language processing.

Key Points

▸ BadGraph is a novel attack framework for universal adversarial attacks on text-attributed graphs (TAGs).
▸ The framework leverages large language models (LLMs) and graph priors to enable efficient perturbation reasoning.
▸ BadGraph achieves effective attacks on both graph neural networks (GNNs) and pre-trained language models (PLMs) with up to a 76.3% performance drop.

Merits

Strength in Adversarial Attack Design

The proposed framework demonstrates a robust approach to designing universal adversarial attacks on TAG models, showcasing its effectiveness across different architectures.

Stealthy yet Interpretable Nature

BadGraph's stealthy yet interpretable nature makes it an attractive solution for assessing the security of TAG models, providing valuable insights for development and defense strategies.

Demerits

Limited Accessibility to Pre-trained Language Models

The framework's reliance on pre-trained language models (PLMs) via APIs may limit its applicability in certain scenarios, where direct access to PLM architecture and parameters is required.

Expert Commentary

The article presents a crucial contribution to the field of adversarial attacks on graph neural networks and text-attributed graphs. The proposed framework, BadGraph, showcases a novel approach to designing universal adversarial attacks, leveraging the strengths of both large language models and graph priors. While the framework's limitations, such as reliance on pre-trained language models via APIs, need to be addressed, the article's findings have significant implications for the development of secure and trustworthy AI systems. Furthermore, the article's exploration of the stealthy yet interpretable nature of BadGraph provides valuable insights for development and defense strategies. Overall, the article presents a compelling argument for the need to consider the security risks associated with the use of text-attributed graphs and highlights the importance of developing robust defense mechanisms against such attacks.

Recommendations

✓ Future research should focus on developing robust defense mechanisms against universal adversarial attacks on text-attributed graphs, leveraging insights from the proposed framework.
✓ Developers and users of text-attributed graph models should be aware of the potential security risks associated with these models and take measures to mitigate these risks, such as using robust architectures and implementing defense mechanisms.

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

AI Commentary

Executive Summary

Key Points

Merits

Strength in Adversarial Attack Design

Stealthy yet Interpretable Nature

Demerits

Limited Accessibility to Pre-trained Language Models

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.