FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical evidence is sparse, fragmented, or requires external verification. To address these limitations, we propose FactGuard, an agentic framework for video misinformation detection that formulates verification as an iterative reasoning process built upon MLLMs. FactGuard explicitly assesses task ambiguity and selectively invokes external tools to acquire critical evidence, enabling progressive refinement of reasoning trajectories. To further strengthen this capability, we introduce a two-stage training strategy that combines domain-specific agentic supervised fine-tuning with decision-aware reinforcement learning to optimize tool usage and calibrate risk-sensiti
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical evidence is sparse, fragmented, or requires external verification. To address these limitations, we propose FactGuard, an agentic framework for video misinformation detection that formulates verification as an iterative reasoning process built upon MLLMs. FactGuard explicitly assesses task ambiguity and selectively invokes external tools to acquire critical evidence, enabling progressive refinement of reasoning trajectories. To further strengthen this capability, we introduce a two-stage training strategy that combines domain-specific agentic supervised fine-tuning with decision-aware reinforcement learning to optimize tool usage and calibrate risk-sensitive decision making. Extensive experiments on FakeSV, FakeTT, and FakeVV demonstrate FactGuard's state-of-the-art performance and validate its excellent robustness and generalization capacity.
Executive Summary
FactGuard is an agentic framework for video misinformation detection that leverages multimodal large language models (MLLMs) with reinforcement learning to address limitations in existing methods. The framework formulates verification as an iterative reasoning process, assessing task ambiguity and selectively invoking external tools to acquire critical evidence. Experimental results on three datasets demonstrate FactGuard's state-of-the-art performance, robustness, and generalization capacity. This approach offers significant improvements over fixed-depth inference methods and has the potential to mitigate the spread of misinformation. By combining MLLMs with reinforcement learning and decision-aware tool usage, FactGuard provides a more adaptive and accurate approach to video misinformation detection.
Key Points
- ▸ FactGuard uses reinforcement learning to optimize tool usage and calibrate risk-sensitive decision making.
- ▸ The framework formulates verification as an iterative reasoning process, assessing task ambiguity and selectively invoking external tools.
- ▸ FactGuard demonstrates state-of-the-art performance on three datasets, showcasing its robustness and generalization capacity.
Merits
Strengths in addressing limitations of existing methods
FactGuard effectively addresses the limitations of fixed-depth inference methods, such as excessive trust in internally generated assumptions and reliance on sparse or fragmented evidence.
Improved accuracy and adaptability
By combining MLLMs with reinforcement learning, FactGuard provides a more adaptive and accurate approach to video misinformation detection.
Demerits
Potential for tool over-reliance
The selective invocation of external tools may lead to over-reliance on these tools, which could compromise the framework's ability to reason independently.
Training data requirements
The two-stage training strategy may require significant data and computational resources, which could pose challenges in real-world applications.
Expert Commentary
FactGuard represents a significant advancement in the field of video misinformation detection. By leveraging reinforcement learning and decision-aware tool usage, the framework offers a more adaptive and accurate approach to addressing the spread of misinformation. However, the potential for tool over-reliance and training data requirements must be carefully considered in the development and implementation of FactGuard. Additionally, the implications of this framework for policymakers and social media platforms are significant, highlighting the need for ongoing research and evaluation in this area.
Recommendations
- ✓ Future research should focus on addressing the potential for tool over-reliance and exploring alternative approaches to decision making.
- ✓ The development of FactGuard should be accompanied by thorough evaluations of its performance in real-world settings and ongoing assessments of its effectiveness in mitigating the spread of misinformation.