😈

Naughty Dubins Car

We study diffusion world model steering in a controlled setting where the ground-truth system dynamics are known β€” directly assessing whether steering can explore a pretrained video world model to identify potential failures while preserving plausibility.

Task & Dynamics
The Naughty Dubins Car is a Dubins car with constant speed and an angular velocity control input. There is a circular failure region at the center of the environment. Uniquely, the car exhibits a "naughty" behavior: with a small probability, the sign of its angular velocity action is flipped. This randomness only occurs during turningβ€”when traveling straight, the behavior is always as intended. As a result, executing a turn can cause the car to unexpectedly turn the opposite direction with some probability.
Action:
Ground Truth (Well-behaved)
😈 Naughty!
Naughty (random flipping actions)

The privileged state is $\mathbf{s}_t = [p_x,\,p_y,\,\theta]$, where the car is controlled by a continuous angular velocity $a_t \in \mathcal{A} = [-a_{\max},\,a_{\max}]$, with $a_{\max} = 1.25\,\mathrm{rad/s}$. The forward speed is fixed at $v = 1\,\mathrm{m/s}$, and the system updates every $\Delta t = 0.05\,\mathrm{s}$. To capture the system's stochasticity, the sign of the control input may be flipped at each step. A red boundary in the right video highlights exactly when the action's sign is flipped, making each occurrence of the "naughty" intervention visually clear.

$$\mathbf{s}_{t+1} = \mathbf{s}_t + \Delta t\,\bigl[\,v\cos\theta_t,\;\; v\sin\theta_t,\;\; \delta_t\,a_t\,\bigr], \qquad \delta_t \in \{-1,\,+1\}$$

Specifically, $\delta_t = -1$ with probability $0.4$ and $\delta_t = +1$ with probability $0.6$. The ground-truth criterion that defines the failure set is given by

$$r(\mathbf{s}) = p_x^2 + p_y^2 - 0.25^2$$

This yields a circular failure region of radius $0.25\,\mathrm{m}$ centered at the origin. We train a diffusion-based world model following the EDM framework using 4000 randomly sampled trajectories, and subsequently learn the criterion function atop the model's latent space. Steering utilizes the learned safety margin as the objectiveβ€”maximizing it for optimistic, or minimizing it for pessimistic steering. The exponential growth of possible outcomes from the sign-flip stochasticity poses a significant challenge for long-horizon steering.

What is noise optimization?

Given a fixed initial state and a fixed action sequence, stochasticity in dynamics lead to multiple different outcomes in world model generations. Noise optimization aims to choose an initial noise of the world model to steer the world model means selecting the initial noise so that the generated trajectory moves toward a chosen extremum of the safety criterion. Use the Pessimistic / Optimistic toggle to compare two searched noise patterns that target opposite ends of that criterion. The red and blue points on the left mark those two candidate initial noises; other gray points are random samples for context.

Click Refresh to draw a new $\mathbf{s}_0$ and new actions $(a_t)$. Click any dot in the noise plot to play the corresponding stochastic trajectory on the right; playback loops until you press pause. Turn on Possible trajectories (forward reachable set) to see where rollouts can spread under the true stochastic dynamics.

(This is an analytical demo with exact dynamics, not WM generations.)

Noise optimization for steering Analytic rollouts with exact dynamics.
Noise Space Β· click a sample
Sample Selected Pessimistic Optimistic FRS
β€”
Stochastic rollout Β· exact dynamics
Margin (Criterion) β€”
Stochastic rollout (Simulation)
0.00s / 0.00s
Results β€” Steering Naughty Dubins Car World Model via Noise Optimization

Now, let’s steer the world model! We optimize the initial noise of the naughty Dubins car world model to maximize or minimize the safety margin: pessimistic steering searches for worst-case outcomes, while optimistic steering seeks best-case trajectoriesβ€”both of which can be difficult to discover under uniform random initialization (gray points). For comparison, we also sample N = 20 random rollouts per scenario. Click Refresh scenario to load a new trajectory, or select any point in the noise scatter to visualize its rollout.

Note: The video on the right shows open-loop imagination from the learned world model, not ground-truth trajectories. As such, generations and steering results may be imperfect due to model bias or error.

WM Rollouts via Optimized Noise Every clip is WM imagination (open loop).
Noise Space Β· click a sample
Initial Noise Selected Pessimistic Optimistic FRS
β€”
World model imagination (open loop)
β€”
How well does noise optimization steer toward failures?
We evaluate steering on 5000 randomly sampled evaluation trajectories. Each trajectory is rolled out in the world model from the initial image $\mathbf{o}_0$ under the same action sequence. Steering minimizes the learned safety-score predictor for 10 optimization steps, driving imaginations toward the failure set. The dashed GT line marks the worst-case minimum safety score achievable under the true stochastic dynamics β€” a plausible steering result should approach but not cross this bound.
Average minimum safety score over each imagined trajectory (5000 trajectories). Lower values are better β€” they indicate the world model imagined outcomes closer to the failure set. Dashed line = ground-truth worst-case bound. Hover any bar for details.

Why is it hard? This is a long-horizon autoregressive rollout. The exponential branching from stochastic dynamics makes it practically impossible for Best-of-N sampling to discover the dangerous modes of the imagined distribution.
Min. safety score over trajectory β€” 5000 trajectories  Dashed = GT bound
Noise optim with reg (orange) steers toward the GT bound while staying within physically plausible limits. No reg (teal) overshoots the GT bound β€” optimized noise leaves the in-distribution manifold yielding physically impossible safety scores. Classifier guidance (red) similarly drives the score below the GT bound, indicating out-of-distribution imaginations. Nominal and Best-of-N are nearly indistinguishable, confirming naive sampling cannot steer long-horizon rollouts.
Typical Set Constraints
We tested the regularizer on the naughty dubins car world model, with 5000 randomly sampled evaluation trajectories. Spectral density, isotropy, and norm are optimization regularizer terms used during steering. Out-of-distribution score (measured by density estimation via flow matching) is not used for optimization; it is an evaluation-only check for whether the regularizer keeps generations in-distribution.
Spectral density
Spectral density of optimized noise in latent space; lower concentration in a few directions indicates healthier manifold coverage.
How robustly can we detect failure events?
We evaluate failure detection on 5000 randomly sampled evaluation trajectories. Each trajectory is labeled positive if it has a non-zero probability of entering the failure set $\mathcal{F}$ under the true stochastic dynamics (ground-truth minimum safety score ≀ 0), and negative otherwise.

We steer the world model's imagination using only the initial observation and action sequence, then classify the trajectory as a predicted failure if the minimum predicted safety score along the rollout is below zero. Noise optimization with the typical-set regularizer achieves high TPR while keeping FPR near the nominal level. Both noise optimization without the regularizer and classifier guidance trivially drive the safety score downward, but with a high false-positive rate β€” indicating physically implausible imaginations.
5000
random evaluation trajectories
2147
positives (possible failure)
2853
negatives (should never fail)
TPR vs TNR
True positive rate (failure detection, x-axis) vs. true negative rate (plausibility, y-axis). Upper-right corner is ideal. Noise optim with reg (orange) achieves the best balance: TPR 88.8% with TNR 92.8% β€” imagined failures are trustworthy. Methods without the typical-set objective (teal, red) trade dramatically higher false-positive rates for marginal TPR gains, revealing implausible imaginations. Error bars are 95% Wilson confidence intervals.
Ablations β€” which regularizer term matters?
OOD score (measured via FAIL-Detect flow-matching density estimation) of the generated latents. Each ablation removes one term from the typical-set constraint; the rest are kept.
OOD Score (via Flow-based Density Estimation) β€” full ablation sweep
Dropping any single regularizer term (no norm, no spectral, no std) lets the density drift above the with reg baseline (≈172). Removing regularization entirely (no reg) sends density to ≈697 β€” roughly 4× inflation, confirming that each term contributes to keeping steered latents in-distribution.