Academic

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.

Julia Jose, Ritik Roongta, Rachel Greenstadt · March 7, 2026 · 1 min read · 2 views

#cs.AI

Executive Summary

This study examines the potential of Large Language Models (LLMs) to generate propagandistic content and explores mitigation strategies. The findings indicate that LLMs can exhibit propagandistic behaviors when prompted and employ various rhetorical techniques. Fine-tuning methods, particularly Odds Ratio Preference Optimization (ORPO), significantly reduce the generation of such content. The research highlights the importance of addressing propaganda generation in LLMs to prevent potential misuse.

Key Points

▸ LLMs can generate propagandistic content when prompted
▸ LLMs use various rhetorical techniques, including loaded language and appeals to fear
▸ Fine-tuning methods, such as ORPO, can mitigate propaganda generation

Merits

Comprehensive Analysis

The study provides a thorough examination of LLMs' potential to generate propaganda and explores effective mitigation strategies.

Methodological Rigor

The use of domain-specific models and fine-tuning methods demonstrates a rigorous approach to investigating propaganda generation and mitigation.

Demerits

Limited Generalizability

The study's findings may not be generalizable to all LLMs or contexts, as the results are based on a specific set of models and prompts.

Lack of Human Evaluation

The reliance on automated models for classification and detection may not fully capture the nuances of human evaluation and judgment.

Expert Commentary

This study contributes significantly to our understanding of the potential risks and challenges associated with LLMs. The findings underscore the importance of addressing propaganda generation and mitigation in LLMs to prevent potential misuse. The use of fine-tuning methods, such as ORPO, offers a promising approach to mitigating these risks. However, further research is needed to fully explore the generalizability and effectiveness of these methods in various contexts. Ultimately, responsible AI development and deployment require careful consideration of the potential consequences of LLMs and proactive measures to mitigate potential harms.

Recommendations

✓ Develop and deploy LLMs with built-in mitigation strategies to prevent propaganda generation
✓ Conduct further research on the generalizability and effectiveness of fine-tuning methods in various contexts and applications

Sources

arXiv - cs.AI

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Analysis

Methodological Rigor

Demerits

Limited Generalizability

Lack of Human Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs