Skip to main content
Academic

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

arXiv:2602.18739v1 Announce Type: new Abstract: Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on a

arXiv:2602.18739v1 Announce Type: new Abstract: Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on average. Under the targeted attack setting, the attack success rate (ASR) reaches 0.55. Downstream studies further show tangible risk, which using attacked videos for training decreases 3D detection performance by about 4%, and worsens open-loop planning performance by about 20%. These findings has for the first time revealed and quantified security vulnerabilities in generative world models, driving more comprehensive security checkers.

Executive Summary

This article presents a novel white-box attack, PhysCond-WMA, that targets physical-condition channels in generative world models, inducing semantic, logic, or decision-level distortion while preserving perceptual fidelity. The approach is optimized in two stages and demonstrates effectiveness in inducing attack success rates of up to 0.55. Experimental results show tangible risks, including decreased 3D detection performance and worsened open-loop planning performance. The study reveals and quantifies security vulnerabilities in generative world models, highlighting the need for comprehensive security checkers. The findings have significant implications for the development of secure world models in applications such as autonomous driving.

Key Points

  • PhysCond-WMA is the first white-box world model attack that targets physical-condition channels.
  • The approach preserves perceptual fidelity while inducing semantic, logic, or decision-level distortion.
  • Experimental results demonstrate tangible risks, including decreased 3D detection performance and worsened open-loop planning performance.

Merits

Innovative Approach

PhysCond-WMA presents a novel approach to attacking generative world models, targeting physical-condition channels and inducing distortion while preserving perceptual fidelity.

Demerits

Limited to Specific Application

The study primarily focuses on autonomous driving applications, limiting the generalizability of the findings to other domains.

Expert Commentary

The study presents a significant contribution to the field of AI security, highlighting the need for comprehensive security checkers in the development of generative world models. The findings have significant implications for the development of secure world models in applications such as autonomous driving. However, the study primarily focuses on autonomous driving applications, limiting the generalizability of the findings to other domains. Future studies should aim to generalize the findings to other domains and explore the applicability of PhysCond-WMA to other types of generative models.

Recommendations

  • Future studies should aim to generalize the findings to other domains and explore the applicability of PhysCond-WMA to other types of generative models.
  • Developers of generative world models should incorporate comprehensive security checkers into their development pipelines to mitigate potential security risks.

Sources