MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
arXiv:2603.02482v1 Announce Type: cross Abstract: Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety Evaluation), an open-source, run-centric platform that integrates automatic cross-modal payload generation, three multi-turn attack algorithms (Crescendo, PAIR, Violent Durian), provider-agnostic model routing, and an LLM judge with a five-level safety taxonomy into a single browser-based system. A dual-metric framework distinguishes hard Attack Success Rate (Compliance only) from soft ASR (including Partial Compliance), capturing partial information leakage that binary metrics miss. To probe whether alignment generalizes across modality boundaries, we introduce Inter-Turn Modality Switching (ITMS), which augments multi-turn attacks with per-turn modality rotation. Ex
arXiv:2603.02482v1 Announce Type: cross Abstract: Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety Evaluation), an open-source, run-centric platform that integrates automatic cross-modal payload generation, three multi-turn attack algorithms (Crescendo, PAIR, Violent Durian), provider-agnostic model routing, and an LLM judge with a five-level safety taxonomy into a single browser-based system. A dual-metric framework distinguishes hard Attack Success Rate (Compliance only) from soft ASR (including Partial Compliance), capturing partial information leakage that binary metrics miss. To probe whether alignment generalizes across modality boundaries, we introduce Inter-Turn Modality Switching (ITMS), which augments multi-turn attacks with per-turn modality rotation. Experiments across six multimodal LLMs from four providers show that multi-turn strategies can achieve up to 90-100% ASR against models with near-perfect single-turn refusal. ITMS does not uniformly raise final ASR on already-saturated baselines, but accelerates convergence by destabilizing early-turn defenses, and ablation reveals that the direction of modality effects is model-family-specific rather than universal, underscoring the need for provider-aware cross-modal safety testing.
Executive Summary
This article introduces MUSE, an open-source, run-centric platform for multimodal unified safety evaluation of large language models. MUSE integrates various tools and features, including automatic cross-modal payload generation, multi-turn attack algorithms, and a safety taxonomy. The platform's dual-metric framework captures partial information leakage and probes whether alignment generalizes across modality boundaries. Experiments across six multimodal LLMs demonstrate the effectiveness of multi-turn strategies and the impact of Inter-Turn Modality Switching (ITMS) on model defenses. The authors highlight the need for provider-aware cross-modal safety testing, underscoring the complexity of large language model safety evaluation.
Key Points
- ▸ MUSE is an open-source, run-centric platform for multimodal unified safety evaluation of large language models.
- ▸ The platform integrates various tools and features, including automatic cross-modal payload generation and multi-turn attack algorithms.
- ▸ MUSE's dual-metric framework captures partial information leakage and probes whether alignment generalizes across modality boundaries.
Merits
Comprehensive Platform
MUSE integrates various tools and features, providing a comprehensive platform for multimodal unified safety evaluation of large language models.
Robust Framework
The platform's dual-metric framework captures partial information leakage and probes whether alignment generalizes across modality boundaries, providing a robust framework for safety evaluation.
Demerits
Limited Scalability
The platform's scalability is limited by the need for provider-aware cross-modal safety testing, highlighting the complexity of large language model safety evaluation.
Dependence on Multi-Turn Attack Algorithms
The effectiveness of MUSE is dependent on the multi-turn attack algorithms used, which may not be universally effective or applicable to all large language models.
Expert Commentary
The article makes a significant contribution to the field of large language model safety, introducing a comprehensive and robust platform for multimodal unified safety evaluation. However, the limitations of MUSE, such as its dependence on multi-turn attack algorithms and limited scalability, highlight the complexity of large language model safety evaluation. The authors' emphasis on the need for provider-aware cross-modal safety testing underscores the importance of considering the specific characteristics and limitations of each model. As the use of large language models continues to grow, the development of comprehensive and robust safety evaluation frameworks, such as MUSE, will be crucial for ensuring the safe and responsible deployment of these models.
Recommendations
- ✓ Future research should focus on developing more robust and scalable safety evaluation frameworks, capable of addressing the complexities of large language model safety evaluation.
- ✓ Organizations developing and deploying large language models should prioritize the use of comprehensive and robust safety evaluation frameworks, such as MUSE, to ensure the safe and responsible deployment of these models.