Abstract

Designing controllers that are both safe and performant is inherently challenging. This co-optimization can be formulated as a constrained optimal control problem, where the cost function represents the performance criterion and safety is specified as a constraint. While sampling-based methods, such as Model Predictive Path Integral (MPPI) control, have shown great promise in tackling complex optimal control problems, they often struggle to enforce safety constraints. To address this limitation, we propose DualGuard-MPPI, a novel framework for solving safety-constrained optimal control problems. Our approach integrates Hamilton-Jacobi reachability analysis within the MPPI sampling process to ensure that all generated samples are provably safe for the system. On the one hand, this integration allows DualGuard-MPPI to enforce strict safety constraints; at the same time, it facilitates a more effective exploration of the environment with the same number of samples, reducing the effective sampling variance and leading to better performance optimization. Through several simulations and hardware experiments, we demonstrate that the proposed approach achieves much higher performance compared to existing MPPI methods, without compromising safety.

Problem Statement

Consider an autonomous system with state \( x \in X \subseteq \mathbb{R}^n \) evolving under continuous dynamics \( \dot{x} = f(x, u, d) \), where \( u \in U \) is the control input and \( d \in D \) is a disturbance input, which may model uncertainty or an adversarial influence. The function \( f \) is assumed to be uniformly continuous in \( u \) and \( d \), bounded, and Lipschitz continuous in \( x \) for fixed \( u \), \( d \).

Let \( \xi_{x,t}^{u,d}(\tau) \) denote the system trajectory at time \( \tau \), initialized at state \( x \) and time \( t \), under control signal \( u(\cdot) \) and disturbance signal \( d(\cdot) \). These signals are measurable functions from time to their respective admissible sets, and are assumed to be piecewise continuous, ensuring the trajectory exists, is unique, and remains continuous for all initial states.

We are given a failure set \( \mathcal{T} \subset X \), representing unsafe states (e.g., obstacles). This set is encoded by a Lipschitz continuous function \( l(x) \), such that \( \mathcal{T} = \{ x : l(x) \leq 0 \} \). The objective is to design a control policy that minimizes a performance cost while ensuring the state never enters \( \mathcal{T} \), even under worst-case disturbances.

The cost functional to minimize over trajectories is defined as:

\( S(\xi) = \phi(x(t_f), t_f) + \int_{t_0}^{t_f} \mathcal{L}(x(t), u(t), t)\, dt \)

Here, \( \phi \) and \( \mathcal{L} \) denote terminal and running costs, respectively, and \( t_f \) is the task horizon. We aim to find a control policy \( u^*(\cdot) \) that minimizes this cost while enforcing safety:

\( \begin{aligned} u^{*}(\cdot) &= \arg\min_{u(\cdot)} S(\xi_{x,t}^{u,d}, u(\cdot)) \\ \text{s.t.} \quad \dot{x}(t) &= f(x(t), u(t), d(t)), \\ l(x(t)) &> 0,\quad u(t) \in U,\quad d(t) \in D,\quad \forall t \in [t, t_f] \end{aligned} \)

This constrained optimal control problem is generally non-convex and difficult to solve. In this work, we propose a sampling-based Model Predictive Path Integral (MPPI) method that rigorously enforces safety while optimizing performance.

Background

Model Predictive Path Integral (MPPI)

Model Predictive Path Integral (MPPI) is a sampling-based control method that optimizes trajectories by perturbing a nominal control sequence with noise and evaluating the resulting system rollouts. Each perturbed trajectory is assigned a cost, and the nominal control is updated as a weighted average of the perturbations, where lower-cost trajectories contribute more, the update is given by:

\[ u(x,j)^{\ast} \approx u(x,j) + \frac{\sum_{k=1}^{K} \exp\left[-\frac{1}{\lambda}S(\xi_j^k)\right] \delta_j^k}{\sum_{k=1}^{K} \exp\left[-\frac{1}{\lambda}S(\xi_j^k)\right]} \]

This procedure is repeated at every time step. The figure below visualizes the forward-simulated rollouts from sampled noisy control sequences. High-cost (unsafe) rollouts are shown in red, while lower-cost (safer) ones are shown in blue.

By weighting noisy control perturbations based on their associated trajectory cost, MPPI approximates a solution to the constrained optimal control problem introduced above. While it does not enforce hard constraints, it optimizes performance by sampling and refining control sequences that are more likely to yield lower-cost, goal-directed behavior.

HJ Reachability and Safety Filtering

To ensure provable safety, we use Hamilton-Jacobi (HJ) reachability analysis to compute the safe set—the set of all states from which the system can avoid entering unsafe regions (e.g., obstacles), despite worst-case disturbances. This is done by solving a differential game and computing a value function \( V(x) \), whose zero sublevel set defines the boundary of safety.

From this value function, we can also derive a safe control policy \( u^*_{\text{safe}}(x) \) that guarantees the system remains within the safe set:

\[ u^*_{\text{safe}}(x) = \arg\max_{u \in U} \min_{d \in D} \nabla V(x) \cdot f(x, u, d) \]

We enforce this safety policy using a Least Restrictive Filter (LRF): the system follows its nominal control unless it reaches the boundary of the safe set, in which case it switches to the safe control above. This guarantees constraint satisfaction without over-restricting performance. The following figure illustrates the filtering process, where the red trajectory follows a nominal unsafe control, and the blue trajectory the least restrictive filtered approach where unsafe actions are overriden in favour of safe controls when the system approaches the boundary of the unsafe region.

DualGuard MPPI

While MPPI optimizes performance by sampling and refining control sequences, it does not guarantee safety as unsafe rollouts are penalized but still generated and discarded, wasting computation. Post-processing with safety filters can enforce constraints, but often sacrifices performance due to reactive corrections.

DualGuard MPPI addresses this by embedding safety enforcement into the MPPI loop through two key steps:

Safe Rollouts: Each sampled control sequence is filtered with a safety controller before rollout, ensuring all trajectories are safe and usable. This improves sample efficiency and performance consistency.
Output Filtering: After computing the optimal control sequence, we apply a final safety filter before execution. This guarantees that the applied command is safe, even in cases where safe trajectories are multimodal and might cancel each other out.

The full DualGuard MPPI algorithm combines both filtering layers into a provably safe and performant control strategy:

Hardware Experiments – RC Car

We evaluate DualGuard MPPI on a miniature RC car executing aggressive laps around a physical racetrack. The car’s state \((x, y, \theta)\) evolves with the control input \(u = [V, \delta]\), where \(V \in [0.7, 1.4]\) m/s is the speed and \(\delta \in [-25^\circ, 25^\circ]\) is the steering angle. Disturbances \(d_x, d_y \in [-0.1, 0.1]\) capture model mismatch and sensor error:

\(\dot{x} = V \cos(\theta) + d_x,\quad \dot{y} = V \sin(\theta) + d_y,\quad \dot{\theta} = \frac{V \tan(\delta)}{L}\)

To reflect real-time performance, we report: Computation Time (ms per control update), Speed (average m/s over three laps), and RelCost (normalized performance cost). Safety-violating episodes are marked as failures. The cost function penalizes deviation from track center, lower speeds, and safety violations:

\(S = (V_{max}-V)^2 + K_c (l_{center} - \phi(x)) + P(x)\)

All methods were run at 50Hz using 1000 parallel rollouts per step on a laptop GPU (NVIDIA RTX 4060) using JAX. The value function and reachable set were precomputed using LevelSetToolbox[1].

Video: Comparison of methods, x4 playback speed.

DualGuard MPPI completes the track safely and aggressively, outperforming other safety-aware methods in speed and cost. Unfiltered methods fail on tight turns despite high penalty weights, highlighting the importance of hard safety constraints. Our safe rollout stage improves sample quality, while the final filtering ensures guaranteed safety—even in multimodal scenarios. All methods remain under the 20ms compute budget.

While the benefits of guaranteed safety are clearly visible in the hardware results, the improvements in sample efficiency introduced by our safe rollout mechanism are harder to observe in this single-task setting. These advantages along with broader comparisons and formulation details are thoroughly explored in the full paper linked at the top of this page.

[1] I. Mitchell, “A toolbox of level set methods,” http://www.cs.ubc.ca/mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR-2004-09, 2004.

Acknowledgements

This research is supported in part by the DARPA ANSR program and by NSF CAREER program (award number 2240163) and BECAS Chile.

This webpage template was borrowed from some colorful folks.