|
|
|
|
Hamilton-Jacobi (HJ) reachability analysis is a powerful tool for analyzing the safety of autonomous systems. However, the provided safety assurances are often predicated on the assumption that once deployed, the system or its environment does not evolve. Online, however, an autonomous system might experience changes in system dynamics, control authority, external disturbances, and/or the surrounding environment, requiring updated safety assurances. Rather than restarting the safety analysis from scratch, which can be time-consuming and often intractable to perform online, we propose to compute parameter-conditioned reachable sets. Assuming expected system and environment changes can be parameterized, we treat these parameters as virtual states in the system and leverage recent advances in high-dimensional reachability analysis to solve the corresponding reachability problem offline. This results in a family of reachable sets that is parameterized by the environment and system factors. Online, as these factors change, the system can simply query the corresponding safety function from this family to ensure system safety, enabling a real-time update of the safety assurances. Through various simulation studies, we demonstrate the capability of our approach in maintaining system safety despite the system and environment evolution.
Autonomous systems generally need to operate in contexts that change, making it challenging to assess the system's safety at any given time. Updating the safety guarantees is often time-consuming and thus impractical for online use.
|
We propose an approach to solve this problem by using parameter-conditioned reachable sets. The possible changes in the environment are parametrized and used as additional states in the system's dynamics, which we call virtual states. During online operation, both regular and virtual states of the system are used to directly evaluate the safety conditions of the system without needing to recalculate the solutions for the HJI-PDE.
To the system's dynamics, an additional state used to model the possible evolutions of the environment is added with zero dynamics. This new extended state can be used to evaluate the system's safety condition, assuming that we have a solution for the HJ reachability problem this extended system poses.
|
|
Then, existing reachability frameworks are used to solve this higher-dimensional reachability problem. Typically such an approach would not be recommended as solving HJI-VI involves computing the value function over a grid representing a discretization of state space and time, which results in an exponential scaling complexity in computation and memory. To overcome this computational challenge, the DeepReach framework is leveraged as its deep learning-based approach is agnostic to grid resolution. The memory required generally scales with the underlying value function complexity, independent of the spatial resolution.
We consider the two-vehicle collision avoidance problem where both agents move at the same constant speed and with input corresponding to their angular velocity. In this case, the input of the pursuer vehicle is modeled as the disturbance in the system. Moreover, the control input is the evader angular velocity, which is bounded to an interval parametrized by βU such that they can change online. Intuitively, a larger βU corresponds to an evader with high maneuverability, whereas a smaller value corresponds to a diminished control authority.
The BRT for this collision set corresponds to all states from which the pursuer can drive the system trajectory into collision despite optimal efforts of the evader to avoid a collision. For example, we consider the situation where the evader moves with a nominal control authority trying to reach the green region while avoiding collision with the pursuer. Nominal operation of the system is interrupted due to an ''engine fail'' for the evader at t0 drastically limiting the evader control authority until it is repaired at time t0 at which the system returns to the nominal control authority. We will compare the results over this scenario for a parameter conditioned controller and a regular nominal controller.
This scenario mimics the situation where the rocket needs to land on an off-shore landing pad that might drift because of the ocean currents. The augmented system, in this case, is a 7D system (6 states and 1 parameter). We compute this 7D parameter-conditioned reachable set using DeepReach. Note that computing such 7D reachable sets are generally infeasible for the traditional HJ reachability methods, which are mostly limited to 6D systems.
To demonstrate the adaptability of the obtained value function, we consider two different cases: (i) a nominal safety controller that does not account for the dependency of the value function on the changing parameter; (ii) an adaptive safety controller that leverages parameter-conditioned value function to account for the movement of the target set by constantly sampling the parameter βL and activating the corresponding safety controller. The parameter-conditioned safety controller (the orange trajectory) constantly adjusts the horizontal thrust of the rocket with the shift in the position of the landing pad, ultimately enabling a successful landing. On the other hand, the nominal safety controller (the gray trajectory) fails to account for these movements and misses the landing pad.
|
Animated trajectories for roket landing using: (Gray) a nominal safety controller; (Orange) an adaptive parameter-conditioned safety controller. |
This example considers a case where a robot and the human both aim to reach their respective goals while the robot tries to avoid collision with the human. To achieve this the future human states are characterized by the set of states that can be occupied by the human over some time horizon T, defined as the Forward Reachable Tube (FRT). In general, the FRT can be overly conservative so to restrict the possible future actions of the human and, consequently, the growth of the FRT a predictive model is considered. Intuitively, when model confidence is high, we are certain that the human is moving directly towards its goal, leading to a much smaller FRT. On the other hand, when the model confidence is low, we are unsure about the human’s future actions.
As the human takes each action, the belief on intention of movement is considered as a online changing parameter. The usefulness of this method is shown in the following example. Here the robot under nominal circumstances plans to take the gray trajectory, but there is an obstacle in the environment not modeled by the robot. When the human deviates from its optimal path to the goal (in order to avoid the obstacle), the robot model of the human cannot explain this motion, and the model confidence is dropped. This results in an update in the perceived human FRT, which can now be performed online due to its parameter-conditioned nature, adjusting the robot trajectory to swerve right to avoid any collision with the human or the obstacle.
|
Trajectories for the human robot interaction whith a controller modeled as a confidence-parameterized FRT. |
To test the proposed method in high-dimensional scenarios, the following three-vehicle collision avoidance problem is considered: we have two evaders (e1 and e2) and one pursuer (p), all three with 3D dynamics, leading to a 9D system in the joint state space of the evaders and the pursuer. The pursuer here represents the adversarial agent that tries to steer evaders to collide either with the pursuer or with each other; evaders, on the other hand, try to avoid the collision. We consider a situation where the pursuer finds evaders of different sizes as it traverses its environment, which implies that the relative collision distance between each pair of vehicles might change online. This pairwise collision distance defines a parameter triplet which accounts for three additional ``virtual" states, resulting in a 12D system overall.
The following figure illustrates slices of the obtained BRT for a particular position and orientation of the two evaders from the point of view of an incoming pursuer in the orientation shown by the gray plane in the bottom right of each slice. The shaded region represents the set of all starting positions of the pursuer for which a collision is unavoidable between at least one of the vehicle pairs. Different sizes for the BRTs on each slice show how the DNN can easily account for the parametrization of the value function by βL.
|
BRT slices for the position of the pursuer that results in a collision with either of the evaders, or force a collision between the evaders. |
AcknowledgementsThe work of Javier Borquez is supported in part by the NVIDIA Academic Hardware Grant Program and BECAS Chile fellowship. This webpage template was borrowed from some colorful folks. |