This article is a follow-up of a previously published blog post titled as “Accelerated Development of Learned Agents for Urban Autonomous Driving” (please see [1]). It deepens the explanation of our behavior-based actor controller that reproduces the human-like behaviors. This actor controller is mainly intended for reproducing learning scenes in driving simulators from which our agents learn how to autonomously and safely navigate in the real world. The structure of this introductory article is as follows:
- The necessity of diverse and human-like behaviors in micro-simulations is briefly discussed.
- A brief overview of our modular and layered architecture governing our actor controller’s behaviors is presented.
- Two examples of driving behaviors in presence of the signalized intersections are presented to express:
-
- How we adapt Behavior Trees (BT) – extensively used in video games – to realize diverse driving behaviors.
- How we use real traffic data to drive our customized simulated actors.
Why do we need human-like behaviors in a traffic micro-simulation?
Considering the gradual proliferation of autonomous vehicles, the most likely scenario in the upcoming decades would be autonomous vehicles sharing the roadways with human-operated vehicles [2]. The human driver behavior is heterogeneous in the sense that different drivers have different driving styles and attributes [3]. Under the same surrounding driving conditions, different drivers may make different decisions, mainly because of our various levels of aggression, attention, frustration, and experience. In addition, human drivers are fairly unpredictable by nature as we make not only mistakes and violations but also evasive maneuvers when we anticipate mistakes of other drivers. All of this introduces additional challenges in creating the appropriate level of behavioral complexity in a simulation environment that is intended for training and testing an autonomous vehicle.
Generic Approaches and Limitations
In order to provide the autonomous driving agent (ego vehicle) with an appropriate simulation environment, one may consider different approaches:
- Re-simulation of other actors’ exact trajectories: The real-world trajectories of the surrounding actors are recorded and replayed as-is in simulation. This may be appropriate for the validation of an existing autonomous driving agent, but does not provide the appropriate level of behavioral complexity needed for training an agent from scratch [1].
- Passive scenario generation using randomized traffic: One could also play out a mixed-traffic microsimulation consisting of multiple autonomous agents and human-controlled actor models, and let a scenario classification engine monitor the simulation whether a desired scenario has happened or not. In addition to being a tool for the validation of an existing autonomous driving agent, this approach may also be beneficial for finding/testing new emergent scenarios that were not planned to be monitored or collected.
Our Approach
Targeted scenario generation by specifying actors’ initial conditions and intended behaviors: Only the initial conditions of a real-world scenario are replicated in the simulation environment, along with the intended behavior or destination for each agent. This allows natural interactions between the actors to emerge as the simulation is running. This is the approach we are presenting in this article; and, we use traffic modeling so that actors’ decisions and resulting interactions are as close as possible to what the ego vehicle would experience in real-world scenarios, sharing the roadways with human-operated vehicles.
We represent a wide range of human driving behaviors by designing a custom actor model and integrating that into a scenario-based simulation execution engine. We designed the architecture of this scenario execution engine to be simulation-agnostic, that can be integrated easily into a simulation platform of choice like Applied Intuition or Carla. In this blog post, we present our integration on the opensource CARLA simulator. The ego vehicle is controlled by a reference autonomous vehicle (AV) software stack and is exposed to different driving situations such as car-following, lane-change, signalized and unsignalized intersection situations (see figure below). We include tunable parameters to add complexity, imperfection, and uncertainty to each behavior. Real-world distributions for these parameters are extracted from recorded naturalistic driving datasets, and the parameter values for each actor are sampled from those distributions. This provides a comprehensive toolchain for training, testing and validating the performance of the AV stack in dynamic driving scenarios.
Some examples of the different driving situations that our autonomous agent (the ego white vehicle) is exposed to in simulation. The ego is trained by being surrounded by the behavior-based actors with human-like behaviors (the blue vehicles).
Modular Architecture
A modular architecture allows us to model a set of behavior-based actors that each can carry out one or more behaviors independently. We adapt Behavior Trees (BT) – a computational model of plan execution extensively used in video games, and gaining popularity in autonomous systems – to realize sequential and parallel combinations of actor behaviors (see an example implementation in the next section).
The module hierarchy in our behavior-based actor model, as shown in the figure below, looks similar to a full AV stack, and some of the modules may indeed be interchangeable. The key difference is that the goal of an AV stack is to produce optimal and safe driving behavior, while the goal of our actor model is to reproduce realistic human behavior (which, as we all know, is not always optimal or safe!). A brief description of the key processing layers (Perception, Planning and Controls) is provided below.
- Perception: Actors may have a fully operational sensing system that consumes data produced by different artificial sensors (cameras, LiDARs, RADARs, GPS, etc.). In this case, the perception module will need to process the sensory data and produce perceived objects, and also localize the actor in the environment. But that would incur a significant computational load, especially when we have multiple actors in a scene. Moreover, it can introduce undesired failures and make it harder to guide actor behavior. Hence we opted to by-pass the perception layer and directly obtain localization estimates and perceived objects from the simulation platform. This “ground truth” information can be further modified by introducing noise and other imperfections, in order to recreate patterns of perception-related failures observed in the real world.
- Planning: Since our focus is on modeling actor behavior, we further decompose the Planning layer into three modules: Route Planning, Behavior Planning, and Motion Planning.
- Route Planning: It is assumed that when a scenario is initiated, the scenario execution engine spawns and assigns a destination to each actor. The route planning module processes this input, and produces a high-level route to the destination, which may include segments of driving along in a lane, turning left/right, changing lanes, etc. It is also possible for the scenario execution engine to specify an explicit sequence of behaviors for an actor, in which case the route planning module is not activated.
- Behavioral Planning: The behavioral planning module operates at the level of driving behaviors, such as lane-following, lane-changing, intersection negotiation, etc. As shown in the diagram below, it consists of three sub-modules:
- Behavior Selector: The behavior selector acts as the high-level decision-maker. If a destination is specified by the scenario execution engine, and a valid route is available, the behavior selector chooses an appropriate behavior for the current segment of driving. Alternatively, if the execution engine explicitly specifies a behavior, that choice overrides the selection criteria, e.g. the scenario may call for a lane-change in the middle of an intersection even if it is not prudent to do so.
- Behaviors: The core behaviors provide abstract driving actions such as lane-following, car-following, lane-change, signalized/unsignalized intersection negotiation, and stop/yield. These may be composed of lower-level maneuvers such as accelerate, cruise, decelerate, turn, etc., which may be shared across core behaviors. Some of the behaviors, such as lane-change and yield are anticipative in nature, which need to take into account the predicted trajectories of other actors. For simplicity, we use a physics-based short-term prediction engine for this purpose, but for more accurate results, a dedicated interaction-aware prediction module can also be introduced into the system.
- Behavior Post-Processor: The outputs of the different behaviors may be significantly different in nature. For instance, a car-following behavior may output a single target acceleration value, whereas a lane-change behavior may produce a target waypoint in the next lane. The post-processing step converts these different representations into a common form that can be passed onto the next stage (namely, Motion Planning). Also, if multiple behaviors are active at the same time (unusual, but possible), then the post-processor can combine/arbitrate between their outputs.
- Motion Planning: Given a desired motion specification from the behavior planning layer (e.g. a target way-point and velocity), this module is responsible for computing a feasible, comfortable trajectory. This layer can additionally be responsible for reactive obstacle avoidance.
- Controls: The controls module is responsible for making the vehicle move along the trajectory produced by the motion planner, using typical throttle, brake and steering inputs. This can be a simple PID controller with adjustable parameters (as demonstrated in the basic agent controller included with CARLA), or a more sophisticated model-predictive controller (MPC).
Human-like failures or imperfections can be injected in different layers of the architecture. As an example, drivers do not always stop precisely at a stop sign, often dangerously crossing into the intersection; this deviation can be introduced into the behavior-based actor model by adding some noise and/or delay at different stages, including Perception (failure to perceive the stop sign), Planning (comfort parameters producing insufficient deceleration) or Controls (imperfect/uncalibrated braking).
Modeling Signalized Intersection Behaviors
The main advantage of Behavior Trees (BTs) is that they are designed to be modular, hierarchical and easily extensible. Mathematically, they are structured as Directed Acyclic Graphs (DAGs). As a result, you can add a new “node” or move one around in a BT to change its behavior, without having to worry about other parts of the tree. This is in contrast with formalisms such as Finite State Machines (FSMs), which often require significant rework to the entire model when adding a new state or changing the connectivity of an existing one.
The hierarchical structure of a BT allows control to flow down to leaf nodes and back up as the nodes finish their tasks. This allows for conditional behaviors to be specified more easily, e.g. if a lane-change behavior fails to execute because of a vehicle in the next lane, the actor can fall back to a lane-follow behavior. FSMs, on the other hand, provide a one way control transfer, in the sense that they have no memory of where a transition from one state to the other was originally made from [8]. Implementing conditional behaviors would require additional states and transitions, making the state machine unnecessarily complex.
We start with a library of reusable nodes that provide a scalable, intuitive, and flexible way of defining composite behaviors. Adding a new core driving behavior consists of defining a sub-tree of these nodes, along with any new nodes specific to the behavior. The complete BT for a comprehensive actor model can become quite large, but it can still be analyzed and tested in terms of sub-trees and nodes. As an example, two snippets of signalized intersection behaviors are highlighted below (unprotected left turn at a green light and proceeding straight at a yellow light). Based on the behavior selected by our scenario selector at signalized intersections, the root of the appropriate BT is activated. If an actor is going through a traffic light, for example, this activation is progressed down the traffic light sub-tree, reaching a leaf node that can be either an Action (depicted as a simple rectangle) or a Condition (depicted as an oval). Each leaf node returns either a Running, Success or Failure status, which is passed back up the tree reaching the root (of the sub-tree).
Note: In the BT diagrams shown above, a Sequence node (depicted by →) runs through its children in order on every cycle or “tick” until a child returns Failure or Running status [9]. A Selector (depicted by ?), runs its children in order until a child returns Success or Running. For more information on the different constructs and idioms used in BTs, check out the excellent documentation for the py_trees package.
Unprotected Left at a Green Light: Turning situations complicate traffic modeling at signalized intersections. That’s why a separate BT is solely dedicated to govern the corresponding behaviors. The enlarged snippet above illustrates a portion of the BT that is active when a behavior-based actor is turning left at a green traffic light. As you can see, it simply verifies that the traffic light applicable to us is Green, and then re-uses the unsignalized intersection behavior. This was originally designed to handle intersections controlled by Stop or Yield signs; however, we can benefit from this model in the left-turn-at-green scenario by assuming that: i) the ego actor and all other actors don’t have a stop sign, and ii) other actors have the Right-of-Way (ROW) over the ego actor.
Some of the parameters that we used in this specific behavior – unprotected left at a green light – are as follows:
- traffic_light_identifyrange: How far is the traffic light’s status detectable (human-like vision range).
- is_allowerd_to_enter_intersection: Boolean value to determine whether an actor is allowed to pull into the intersection while yielding to the thru traffic. Pulling into the intersection may also be restricted by other factors.
- time_to_collision_safe_margintravel_time: Safety margin for the time-to-collision model that is added to the actor’s travel time to the collision point. Larger values will create more conservative behavior.
- time_to_collision_safe_margindistance: Safety margin for the time-to-collision model that inflates/deflates the imaginary collision avoidance area surrounding each conflicting actor. Larger values will tend to leave more gap between actors.
A range of diverse behaviors can be obtained by changing these parameters, as shown below.
Just by changing a few parameters, a variety of human-like simulation scenarios are achieved for an actor turning left at a green traffic light. Different levels of aggressiveness are shown here for the white vehicle (for demonstration purposes, all the demonstrated vehicles, including the ego white vehicle, are driven by our behavior-based actor controller).
Straight at a Yellow Light: When approaching a traffic light, if the light turns from green to yellow, the behavior of an actor is primarily determined by the “travel time” remaining to reach the intersection. If this time is small enough (usually less than the duration of the yellow light), then the actor can clearly make it through and so the Cross node is activated. If it is too long, then the actor clearly can’t make it and so the Stop/Yield node is activated. This leaves a certain range of travel time which creates a “dilemma zone” where the choice of whether to go through or not is random, governed by a specified probability (which is also reflective of the “aggressiveness” of the actor in yellow-light situations).
Here we summarize the parameters that are used in this specific behavior — i.e. when approaching a yellow traffic light:
- no_dilemma_travel_timelb : Actors with lower travel time to the stop-line will proceed with no dilemma.
- no_dilemma_travel_timeub : Actors with greater travel time to the stop-line will stop with no dilemma.
- dilemma_stopping_probabilitythr : Actors with larger stopping probability are more likely to stop.
A variety of human-like behaviors at yellow light (including running a red light!) were obtained by varying these parameters, as shown below. In the next section, we will demonstrate how such parameters can be fit to real-world traffic data.
With our customized behavior-based actor (demonstrated here by the white vehicle), a variety of human-like simulation scenarios are achievable just at the yellow traffic light itself. The integrated human driving model creates a dilemma zone where the choice of whether to go through the intersection or not is unpredictable. On the other hand, the CARLA simulator’s simple traffic agent [7], demonstrated here by the black vehicle, produces predictable and easily replicable behaviors.
Data-Driven Traffic Modeling – for simulating behavior in dilemma zones
For testing, verification, and training our self-driving cars in simulation, it is critical to utilize real traffic data to drive our customized simulated actors and reproduce their surrounding environment. The simulation output needs to fairly match the original traffic conditions not only statistically but also on an individual scenario basis. This data-driven approach and its benefits for our application can be better explained by an example of the human-like behavior modeling at yellow traffic lights. We created scenarios/situations for our behavior-based actor in the CARLA simulator similar to that of a motorist who will have difficulty making the correct stop/go decision when approaching a yellow light at signalized intersections.
In this context, we would like to rephrase the main features of our behavior-based actor as follows:
- The simulated actor provides various levels of uncertainty when making the stop or go decision (as depicted in the scenarios shown above).
- The characteristics of the simulated actor can be easily tuned/changed by altering behavior-specific parameters – this is what allowed us to produce a red light running scenario without explicitly forcing the actor to ignore the traffic light.
- The probability of the simulated actor’s choice of crossing through at yellow light is extracted from real-world data.
We are interested in predicting the likelihood that a motorist decides to stop or cross the intersection at a yellow light. The concept of “travel time” introduced in the previous section can be further enhanced by modeling the decision as some function of the actor’s current velocity, and distance to the associated stop-line demarcating the threshold for the intersection.
Because the dependent variable in this function is dichotomous (stop/go), a Binary Logistic Regression is useful and a linear regression is not an appropriate model (y=1 represents that the actor chooses to stop, while y=0 represents that the actor proceeds through the intersection) [4, 5]. A new dependent variable \(Logit(P)\) is created as proposed in [4], where P(y=1) is the probability of a motorist’s choice of stopping, similarly P(y=0) is the probability of the motorist’s choice of crossing through, and \(Logit(P)\) is the natural log of odds of y=1 vs y=0:
\(Logit(P)=ln[\frac{P(y=1)}{1-P(y=1)}]\)
Assuming that \(Logit(P)\) is a linear function of the distance of the vehicle to the stop bar (dstopbar), and the vehicle velocity (v), we have:
\(ln[\frac{P(y=1)}{1-P(y=1)}]=b_0+b_1\times v+b_2\times d_{stopbar}\)
\(P(y=1) = \frac{e^{b_0 + b_1\times v+ b_2\times d_{stopbar}}}{1+e^{b_0 + b_1\times v+ b_2\times d_{stopbar}}}\)
\(P(y=0) = 1- P(y=1)\)
Here, the function is characterized by 3 regression coefficients ((b0 , b1 , b2 ). This model was already fitted to a dataset of recorded videos collected by [6] in Changchun, China. The obtained stopping probabilities are plotted here for b0 =-1.552 , b1 =-0.050, and b2 =0.151 [4].
What to expect next?
Armed with a hierarchically structured actor model that represents a wide range of driving behaviors which is successfully integrated with a simulator, we are widening the spectrum of driving behaviors over time, increasing diversity within each behavior, and injecting uncertainty at various levels of processing (including perception). We are simulating erratic human behaviors in terms of imperfect actions that include drunk driving and loss of control.
The next blog post in this series will take you more deeper into the aspects of data-driven traffic modeling. To make simulations more realistic, we are extracting parameterized distributions for different driving behaviors from real-world data. Some of the more well-studied behaviors such as car-following and lane-changing can be modeled using existing traffic datasets; however, a comprehensive model will require data at scale, especially from urban driving environments. This is where we are leveraging the data collection and processing technology that we have developed for the various shared mobility fleets being operated by Ridecell’s platform across the world.