Academic

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

arXiv:2603.09706v1 Announce Type: new Abstract: While safety alignment for Multimodal Large Language Models (MLLMs) has gained significant attention, current paradigms primarily target malicious intent or situational violations. We propose shifting the safety frontier toward consequence-driven safety, a paradigm essential for the robust deployment of autonomous and embodied agents. To formalize this shift, we introduce OOD-MMSafe, a benchmark comprising 455 curated query-image pairs designed to evaluate a model's ability to identify latent hazards within context-dependent causal chains. Our analysis reveals a pervasive causal blindness among frontier models, with the highest 67.5% failure rate in high-capacity closed-source models, and identifies a preference ceiling where static alignment yields format-centric failures rather than improved safety reasoning as model capacity grows. To address these bottlenecks, we develop the Consequence-Aware Safety Policy Optimization (CASPO) framew

Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma · March 11, 2026 · 1 min read · 15 views

#cs.AI

Executive Summary

This article introduces OOD-MMSafe, a benchmark for evaluating Multimodal Large Language Models' (MLLMs) ability to identify latent hazards within context-dependent causal chains. The authors argue that current safety paradigms focus on malicious intent or situational violations, rather than consequence-driven safety. They propose the Consequence-Aware Safety Policy Optimization (CASPO) framework, which integrates the model's intrinsic reasoning as a dynamic reference for token-level self-distillation rewards. Experimental results demonstrate that CASPO significantly enhances consequence projection, reducing the failure ratio of risk identification. The authors' focus on consequence-driven safety is a crucial shift in the field, as it acknowledges the complexities of real-world applications. However, further investigation is needed to validate the generalizability of CASPO and its scalability to various MLLM architectures.

Key Points

▸ OOD-MMSafe introduces a new benchmark for evaluating consequence-driven safety in MLLMs.
▸ The CASPO framework integrates the model's intrinsic reasoning for improved consequence projection.
▸ Experimental results demonstrate significant reductions in the failure ratio of risk identification.

Merits

Strength in methodology

The authors' use of a curated dataset and a novel framework for consequence-driven safety evaluation provides a rigorous and systematic approach to assessing MLLM safety.

Practical applications

The CASPO framework has the potential to enhance the safety and reliability of autonomous and embodied agents in real-world applications.

Demerits

Limitation in generalizability

The authors' results are based on a limited set of MLLM architectures, and further investigation is needed to validate the generalizability of CASPO to various models.

Scalability concerns

The computational complexity of the CASPO framework may pose scalability challenges for large-scale deployment.

Expert Commentary

The article represents a significant contribution to the field of MLLM safety, as it shifts the focus from malicious intent to consequence-driven safety. The CASPO framework demonstrates promising results in enhancing consequence projection, and its potential applications in autonomous and embodied agents are substantial. However, further investigation is necessary to validate the generalizability of CASPO and address scalability concerns. The article's implications extend beyond the MLLM community, influencing policy decisions and practical applications in AI safety and reliability.

Recommendations

✓ Future research should prioritize the development of more scalable and generalizable frameworks for consequence-driven safety evaluation.
✓ The MLLM community should engage in a broader discussion on ensuring the safety and reliability of AI systems, particularly in applications involving autonomous and embodied agents.

Sources

arXiv - cs.AI

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

AI Commentary

Executive Summary

Key Points

Merits

Strength in methodology

Practical applications

Demerits

Limitation in generalizability

Scalability concerns

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs