Hamilton-Jacobi Reachability Analysis for Hybrid Systems with Controlled and Forced Transitions

 Javier Borquez* Shuang Peng* Yiyu Chen* Quan Nguyen* Somil Bansal*

 *University of Southern California

Submitted to ICRA 2024

# Abstract

Hybrid dynamical systems with non-linear dynamics are one of the most general modeling tools for representing robotic systems, especially contact-rich systems. However, providing guarantees regarding the safety or performance of such hybrid systems can still prove to be a challenging problem because it requires simultaneous reasoning about continuous state evolution and discrete mode switching. In this work, we address this problem by extending classical Hamilton-Jacobi (HJ) reachability analysis, a formal verification method for continuous non-linear dynamics in the presence of bounded inputs and disturbances, to hybrid dynamical systems. Our framework can compute reachable sets for hybrid systems consisting of multiple discrete modes, each with its own set of non-linear continuous dynamics, discrete transitions that can be directly commanded or forced by a discrete control input, while still accounting for control bounds and adversarial disturbances in the state evolution. Along with the reachable set, the proposed framework also provides an optimal continuous and discrete controller to ensure system safety. We demonstrate our framework in simulation on an aircraft collision avoidance problem, as well as on a real-world testbed to solve the optimal mode planning problem for a quadruped with multiple gaits.

# Problem Formulation

Consider a hybrid dynamical system with a finite set of discrete modes $Q = \{q_1, q_2, ..., q_N\}$, connected through controlled or forced switches. We also refer to $q_i$ as the discrete state of the system.

Let $x \in X \subset \mathbb{R}^{n_x}$ be the continuous state of the system, $u \in U \subset \mathbb{R}^{n_u}$ be the continuous control input, and $d \in D \subset \mathbb{R}^{n_d}$ be the continuous disturbance in the system. In each discrete mode $q_i \in Q$, the continuous state evolves according to the following dynamics:

$\dot{x} = f_i(x, u, d)$

as long as $x \in S_i \subset X$, where $S_i$ is the valid operation domain for mode $q_i$.

$q_i$ also has discrete control switches $\sigma_{ij}$ that allow controlled transition into another discrete mode $q_{j}$, where $j \in \{j_1,j_2, ...,j_n\} \subset \{1, 2, ..., N\}$. Note that the number of discrete modes the system can transition into (that is, $n$) may vary across different $q_i$. As $x$ approaches ${S_i}^C$ (denoted $x \to {S_i}^C$), the system must take one of the forced control switches, $\varsigma_{ik}$. This leads to a forced transition into another discrete modes $q_{k}$ where $k \in \{k_1,k_2,...,k_m\} \subset \{1, 2, ..., N\}$. Each discrete transition from $q_i$ to $q_j$, whether controlled or forced, might lead to a state reset $x^+ = R_{ij}(x)$, where $x$ is the state before the transition, $x^+$ is the state after the transition.

Our key objective in this work is to compute the Backward Reachable Tube (BRT) of this hybrid dynamical system, defined as the set of initial discrete modes and continuous states $(x,q)$ of the system for which the agent acting optimally and under worst-case disturbances will eventually reach a target set $\mathcal{L}$ within the time horizon $[t, T]$. Mathematically, the BRT is defined as:

$\mathcal{V}(t)=\{(x_0, q_0): \forall d(\cdot), \exists u(\cdot), \sigma(\cdot), \varsigma(\cdot), \exists s \in [t, T], x(s) \in \mathcal{L}\}$

where $x(s)$ denotes the (continuous) state of the system at time $s$, starting from state $x_0$ and discrete mode $q_0$ under continuous control profile $u(\cdot)$, discrete control switch profile $\sigma(\cdot)$, forced control switch profile $\varsigma(\cdot)$, and disturbance $d(\cdot)$. Along with the BRT, we are interested in finding the optimal discrete and continuous control inputs that steer the system to the target set. Conversely, if $\mathcal{L}$ represents the set of undesirable states for the system (such as obstacles for a navigation robot), we are interested in computing the BRT with the role of control and disturbance switched, i.e., finding the set of initial states from which getting into $\mathcal{L}$ is unavoidable, despite best control effort.

# Theorem I

The core contribution of this work is Theorem I, an extension of the classical HJ reachability framework to hybrid dynamical systems with controlled and forced transitions, as well as state resets.

For a hybrid dynamical system, the value function $V(x, q_i, t)$ that captures the BRT for an implicit target function $l(x)$ is given by solving the following constrained VI:

If $x \in S_i$:

\begin{multline} \min\{l(x)-V(x, q_i, t), \min_{\sigma_{ij}}V(R_{ij}(x), q_{j}, t) - V(x, q_i, t), D_{t}V(x, q_i, t)+ \max_{d \in D} \min_{u \in U} \nabla V(x, q_i, t)\cdot f_i(x,u,d)\} = 0, \end{multline}

and if $x \in S_i^C$:

\begin{equation} \label{eqn:froced_transition_valfunc} V(x, q_i, t) =\min_{\varsigma_{ik}}V(R_{ik}(x), q_{k}, t) \end{equation}

With terminal time condition:

\begin{equation} V(x, q_i, T) = l(x) \end{equation}

# Proof

We start with an analogous formulation to HJ reachability with a target function $l(x)$ such that the target set is defined as its sub-zero level set, $\mathcal{L} = \{(x) : l(x)\leq 0\}$. The BRT seeks to find all states that could enter $\mathcal{L}$ at any point within the time horizon. This is computed by finding the minimum distance to $\mathcal{L}$ over time.

The value function is defined as the cost incurred under optimal discrete and continuous controls that minimizes distance to the target and optimal disturbances that maximizes it. We first consider the case when the state belongs to the interior of the domain $S_i$. The value function is given as:

$V(x, q_i, t) = \max_{d \in D} \min_{u \in U} \min_{\sigma_{ij}}\min_{s \in[t, T]} l(x(s))$

With terminal time condition given by:

$V(x, q_i, T) = l(x)$

The optimal decision making for discrete transitions is pictorially shown in the figure below.

The discrete transitions are shown in green arrows and continuous control flow are indicated by purple trajectories. We also add a ''dummy'' discrete transition, $\sigma_{ii}$, that keeps the system in the current discrete mode. By definition, the overall value function is given by the minimum cost across all these possible trajectories of the system. From here, we consider two different cases:

Case 1: If the system switches to a new discrete mode $j^*$, the continuous state immediately transitions to $R_{ij^*}$. In this case, the optimal cost incurred by the system from the new state $(R_{ij^*}(x), q_{j^*})$ is, by definition, given as $V(R_{ij^*}(x), q_{j^*}, t)$. Thus, the optimal cost across all possible discrete transitions is given as:

$V(x, q_i, t) =\min_{\sigma_{ij}}V(R_{ij}(x), q_{j}, t)$

Case 2: The system evolves in the same mode $q_i$, i.e., it takes the dummy switch $\sigma_{ii}$. In this case, for a small time step $\delta$ and using the dynamic programming principle for the cost function, we have:

\begin{align} \label{eqn:cont_flow} V(x, q_i, t) & = \max_{d \in D} \min_{u \in U} \min\{\min_{s \in[t, t+\delta]} l(x(s)), \nonumber V\left(x(t+\delta), q_i, t+\delta\right)\} \nonumber\\ & \approx \max_{d \in D} \min_{u \in U} \min\{l(x(t)), V\left(x(t+\delta), q_i, t+\delta\right)\} \end{align}

Approximating the value function at the next time step with a first order Taylor expansion:

\begin{align} V\left(x(t+\delta), q_i, t+\delta\right) \approx & V(x, q_i, t) + \nonumber D_{t}V(x, q_i, t)\delta+\nabla V(x, q_i, t)\cdot\delta x \end{align}

where the change in the state $\delta x$ can be approximated as $f_i(x,u,d) \delta$. Ignoring the higher order terms and replacing the Taylor expansion in the value function:

\begin{align} V(x, q_i, t) = \min\{l(x(t)), V(x, q_i, t) + D_{t}V(x, q_i, t)\delta+ \max_{d \in D} \min_{u \in U} \nabla V(x, q_i, t)\cdot f_i(x,u,d) \delta \} \end{align}

The optimal value function is given by the minimum across the two cases:

\begin{align} V(x, q_i, t) = \min\{l(x), V(x, q_i, t) + D_{t}V(x, q_i, t)\delta+ \max_{d \in D} \min_{u \in U} \nabla V(x, q_i, t)\cdot f_i(x,u,d) \delta , \min_{\sigma_{ij}}V(R_{ij}(x), q_{ij}, t)\} \end{align}

Subtracting the value function $V(x, q_i, t)$ from both sides:

\begin{align} 0 = \min\{l(x(t))-V(x, q_i, t),\delta (D_{t}V(x, q_i, t)+ \max_{d \in D} \min_{u \in U} \nabla V(x, q_i, t)\cdot f_i(x,u,d)) , \min_{\sigma_{ij}}V(R_{ij}(x), q_{ij}, t) - V(x, q_i, t) \} \end{align}

Since the above hold for all $\delta>0$ we must have:

\begin{align} 0 = \min\{l(x(t))-V(x, q_i, t),D_{t}V(x, q_i, t)+ \max_{d \in D} \min_{u \in U} \nabla V(x, q_i, t)\cdot f_i(x,u,d) , \min_{\sigma_{ij}}V(R_{ij}(x), q_{j}, t) - V(x, q_i, t) \} \end{align}

Note that here we have presented an informal proof based on the Taylor series expansion of the value function; a more rigorous proof of HJI-VI for continuous evolution can be found in .

For states outside the domain of the discrete mode $q_i$ (i.e., $x \in S_i^C$), our discrete decision is over the set of available forced switches $\varsigma_{ik}$. For this case the proof is analogous to the Case 1 previously considered, but without the option to stay in $q_i$. The value function is then given by:

\begin{equation} V(x, q_i, t) =\min_{\varsigma_{ik}}V(R_{ik}(x), q_{k}, t) \end{equation}

# Simulation Results

## Dog 1D

To illustrate our approach, we consider very simplified, high-level dynamics for a robot quadruped moving in one dimension in a room with an obstacle (table). The robot goal is to reach the marked area on the other side of the table.

The hybrid system representation of this system is shown in the following diagram. Here, the continuous state $x \in \mathbb{R}$ is the position of the robot and $q_i$ with $i \in \{1,2,3\}$ are the discrete modes of the system. $u \in [0,1]$ is the continuous control of the system. The target set is given as the set of positions $9 \leq x \leq 10$. In this example, $q_1$ represents a walking gait, $q_3$ represents frozen dynamics when the walking robot comes in contact with the table obstacle, and $q_2$ is a crawling gait that allows the robot to move under the table but at a slower rate compared to walking.

The BRT for this example corresponds to all the combinations of continuous states and discrete modes $(x,q)$ from which the quadruped can reach the target set within a $5$ second time horizon. Computing it using the classical Hamilton-Jacobi formulation with no optimal discrete mode switches gives the following results: (Top) Value function for crawling gait mode. (Bottom) Value function for walking gait mode. Obstacle is shown in shaded red. Blue shade shows BRT as the subzero level of the value function. Green shade shows target set as the subzero level of $l(x)$.

For the crawling gait it can be observed that the BRT (the blue shaded area) propagates through the obstacle and the BRT includes $x=4m$ because the system can move at $1m/s$ reaching the boundary of the target in exactly $5s$. For the walking wait, even though the system is allowed to move faster, the BRT stops at the boundary of the obstacle as the walking gait gets stuck, so only states that start from the right side of the obstacle can reach the target set.

We now apply the proposed method to compute the BRT. To implement Algorithm 1, we use a modified version of helperOC library and the level set toolbox, both of which are used to solve classical HJI-VI. We use a grid of 301 points over the $x$ dimension for each of the 3 discrete operation modes. The resulting BRTs are the following: Value functions for states starting in a walking gait for a $3$ and $5$ seconds time horizon. Area within the obstacle is shown in shaded red. Blue shade shows BRT as the subzero level of the value function. Green shade shows the target set as the subzero level of the implicit target function $l(x)$.

The intuitive solution for the optimal decision-making in this scenario is to use the walking gait everywhere except in the obstacle area. The fast walking gait allows the system to reach the target quicker, while crawling allows to expand the reachable states beyond the obstacle. The computed solution using our algorithm indeed aligns with this intuition. For example, if we consider the top value function for the 3-second time horizon, we can observe that the limit of the BRT is $7m$ away from the target set, which coincides with the intuitive solution of walking for $1s$ covering $3m$, then crawling under the table for $1s$ covering its $1m$ width, to finally go back to walking for $1s$ covering other $3m$. The bottom value function shows the BRT corresponding to a $5s$ time horizon. Compared to the previous classical attempt, where the BRT stopped growing beyond the obstacle boundary, we can see how the proposed algorithm can optimally leverage discrete transitions along with continuous control to reach the target set from a wider set of initial states.

## Two-Aircraft Conflict Resolution

We next consider the two-aircraft conflict resolution example. Here, two aircraft flying at a fixed altitude are considered. Each aircraft controls its speed; we assume that the first aircraft is controllable, while the other aircraft's speed is uncertain (within a known interval) and hence modeled as an adversarial disturbance to ensure safety. Using relative coordinates, the continuous dynamics of the system are described by:

\begin{equation} \dot{x}_r =-u+d \cos \psi_r+\omega_u y_r \\ \dot{y}_r =d \sin \psi_r-\omega_u x_r \\ \dot{\psi}_r =\omega_d-\omega_u, \end{equation}

where state $[x_r,y_r,\varphi_r]^T$ is the relative $xy$ position and heading between the aircraft. $u$ and $d$ are the velocities of the two aircraft, and $\omega_u$ and $\omega_d$ are their turning rates. The conflict resolution protocol consists of three different operation modes. The aircraft begin in mode $q_1$ with a straight flight ($\omega_u=\omega_d=0)$, keeping a costant relative heading. In this mode, the aircraft are on a collision course. At some time, the aircraft can begin the collision avoidance maneuver by taking the switch $\sigma_{12}$, transitioning the system into mode $q_2$ and an instantaneous heading change of $\pi/2$. In mode $q_2$, the aircraft undergo a circular flight path, where both aircraft use the same constant turning rate ($\omega_u=\omega_d=1)$. Once the aircraft undergo a heading change of $\pi$ radians, the aircraft are forced to switch to mode $q_3$, making another instantaneous $\pi/2$ turn and resuming their straight flight $(\omega_u=\omega_d=0$). The two aircraft are now on a collision-free path. To keep track of the transition time on configuration $q_2$ we add an additional timer state $z$. The hybrid system representation of this system is shown in the following diagram:

For this system, we are interested in finding the set of all initial states in mode $q_1$ from which a collision cannot be averted between the two aircraft under the above protocol, despite the best control effort. Thus, the complement of the BRT will represent the safe states for the system. For BRT calculations, we consider a circular collision target set of 5 units around the controlled aircraft. This corresponds to the two aircraft being in close proximity of each other, also referred to as ''loss of separation''. We use an initial relative heading of $\varphi_r=2\pi/3$ and a time horizon of $t=5s$. For aircraft velocities we consider $u \in [1.5,3]$ and $d \in [2,4]$. The calculations are carried on a $[x_r,y_r,z]$ grid of $[201,201,101]$ points, $\varphi_r$ has null dynamics in all operation modes and is not considered in the grid. Results for BRT starting from operation mode $q_1$ are shown in the following figure. They can be interpreted as follows: if the relative coordinates of the aircraft belong to a state outside the BRT, the collision avoidance maneuver can be initiated while ensuring a collision can always be avoided as long as the optimal discrete and continuous control is applied. Backward reachable tube for two-aircraft conflict resolution starting from mode $q_1$. (Left) BRT for fixed speed of aircraft. (Right) BRT for aircraft under optimal control and disturbance.

The left BRT exactly replicates the results of the case presented in  where both aircraft keep their speed constant at their maximum value $u=3$ and d=4), where the inside of the BRT correspond to states where the adversarial airplane can place itself behind the controlled aircraft and use its higher velocity to force a collision. However, unlike , we do not use any hard-coded heuristics to account for the discrete transitions during the BRT computation. Furthermore, the right BRT considers the scenario where, in addition to the control on the transition, both aircraft control their velocities optimally to avoid/force a collision with the other aircraft. The changes in the BRT are reflected in the shaded areas, where the growth in the BRT close to the target corresponds to cases where the adversarial aircraft slows down to align itself with the controlled aircraft and then use its higher velocity to force a collision. The decrease in size near the tail corresponds to states where the blue airplane slows down to avoid a collision. # Hardware Experiments We next apply our method for task-based, high-level mode planning on a real-world quadruped robot, to reach a goal position in a terrain consisting of various obstacles. The quadruped has different walking modes, such as normal walking, walking on a slope, etc., each of which is modeled as a discrete mode in the hybrid system as shown in the following diagram: Within each discrete mode, we use simplified continuous dynamics to describe center-of-mass evolution: \begin{equation*} \dot{x} =k_x V_{N} \sin \varphi + d_x; \quad \dot{y} =k_y V_{N} \cos \varphi + d_y; \quad \dot{\varphi} =k_{\omega} \omega_{N}, \end{equation*} where the state \([x,y,\varphi]^T represent the $xy$ position and heading angle of the robot. $V_{N} = 0.25m/s$ is the fixed (normalized) linear velocity and $\omega_{N} \in [-0.5, 0.5] rad/s$ is the angular velocity control. $k_x$, $k_y$, and $k_{\omega}$ are velocity gains for different operation modes. $d_x$ and $d_y$ are bounded disturbances on $xy$ velocity. Also, we use discrete state $[z,\theta,\xi]^T$ to set the body's height, pitch, and roll angle in different operation modes for overcoming various obstacles.

Each operation mode matches a specific obstacle or terrain in the testing area. The quadruped begins in mode $q_1$ for fast walking. If the robot reaches the boundary of the slope area, the system will be forced to switch to mode $q_2$ with a $20^{\circ}$ body pitch angle for slope climbing. At some time, the control may switch to mode $q_3$ to walk slowly with a lower body height, this allows the robot to make a narrow turn or crawl under obstacles. If the robot encounters a tilted obstacle, the system will go through a forced switch into mode $q_4$ to walk with a tilted body. Whenever the robot touches a ground obstacle, the system will stay in mode $q_5$ permanently with frozen dynamics. The quadruped needs to reason about optimally switching between these different walking modes in order to reach its goal area (the area marked as the pink cylinder in following videos) as quickly as possible without colliding with any obstacles. BRT for this experiment is calculated on a $[x,y,\varphi]$ grid of $[40,40,72]$ points, over a time horizon of $t=45s$. Having the BRT, we can acquire optimal velocity control, operation mode, and the desired discrete body states at each time. The quadruped is equipped with an Intel T265 tracking camera, allowing it to estimate its global state via visual-inertial odometry. The high-level optimal velocity control and discrete body states provided by our framework are then tracked by a low-level MPC controller that uses the centroidal dynamics of the quadruped.

Our first experiment result is shown in the accompanying video. In our environment setup, the optimal (fastest) route to reach the target is via leveraging the slope on the right. Specifically, the robot remains in the normal walking mode and switches to the slope walking mode once near the slope. When the robot approaches the end of the slope, it needs to make a tight left turn to avoid a collision with the wall ahead. Our framework is able to recognize that and makes a transition to the slow walking mode to allow for a tighter turn. Once the turn is complete, the robot goes back in the faster walking mode.

In our second experiment, we put some papers on the slope, making it more slippery. This is encoded by adding a higher disturbance in the slope mode $q_2$. The new BRT makes the slope path infeasible and hence the robot needs to reach the target via the (slower) ground route. Once again, apart from selecting a new safe route, the proposed framework is able to switch between different walking modes along the route to handle tight turns and slanted obstacles, as shown by the activation of the tilt walking mode.

Finally, to test the closed-loop robustness of our method, we add human disturbance in the ground route experiment. The robot is kicked and dragged by a human during the locomotion process. Specifically, the robot was dragged to face backward, forcing it near the tilted obstacle. However, the reachability controller is able to ensure safety by reactivating the tilt walking mode, demonstrating that the proposed framework is able to reason about both closed-loop continuous and discrete control inputs. Nevertheless, it should be emphasized that the presented algorithm does not solve the entirety of legged locomotion planning, but rather only provides the top level of a hierarchical planning architecture that typically include a robust footstep planner, a wholebody controller, and reliable state estimation.

# Discussion And Future Work

We present an extension of the classical HJ reachability framework to hybrid dynamical systems with controlled and forced transitions, and state resets. Along with the BRT, the proposed framework provides optimal continuous and discrete control inputs for the hybrid system. Simulation studies and hardware experiments demonstrate the proposed method, both to reach goal areas and in maintaining safety. Our work opens up several exciting future research directions. First, we rely on grid-based numerical methods to compute the BRT, whose computational complexity scales exponentially with the number of continuous states, limiting a direct use of our framework to relatively low-dimensional systems. We will explore recent advances in learning-based methods to solve HJI-VI to overcome this challenge. Another interesting direction would be to extend our framework to the cases involving uncertainty in the transition surface $S_i$ or where controlled switches have different transition surfaces. Finally, we will apply our framework on a broader range of robotics applications and systems.

 J. Lygeros, “On reachability and minimum cost optimal control,” Automatica, vol. 40, no. 6, pp. 917–927, 2004.

 I. Mitchell, “A toolbox of level set methods,” http://www.cs.ubc.ca/mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR-2004-09, 2004.

 I. Mitchell and C. J. Tomlin, “Level set methods for computation in hybrid systems,” in International workshop on hybrid systems: Computation and control. Springer, 2000, pp. 310–323.

# Acknowledgements

This research is supported in part by the DARPA ANSR program and by NSF CAREER program (award number 2240163) and BECAS Chile.

This webpage template was borrowed from some colorful folks.