WO2021160273A1 - Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment - Google Patents

Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment Download PDF

Info

Publication number
WO2021160273A1
WO2021160273A1 PCT/EP2020/053817 EP2020053817W WO2021160273A1 WO 2021160273 A1 WO2021160273 A1 WO 2021160273A1 EP 2020053817 W EP2020053817 W EP 2020053817W WO 2021160273 A1 WO2021160273 A1 WO 2021160273A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
frames
lane
road
module
Prior art date
Application number
PCT/EP2020/053817
Other languages
French (fr)
Inventor
Muhammad Saad ZIA
Faizan MEHMOOD
Original Assignee
Automotive Artificial Intelligence (Aai) Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Automotive Artificial Intelligence (Aai) Gmbh filed Critical Automotive Artificial Intelligence (Aai) Gmbh
Priority to DE112020006532.4T priority Critical patent/DE112020006532T5/en
Priority to PCT/EP2020/053817 priority patent/WO2021160273A1/en
Publication of WO2021160273A1 publication Critical patent/WO2021160273A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/167Driving aids for lane monitoring, lane changing, e.g. blind spot detection

Definitions

  • the present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling as well as a respective training computing system, and a computing system for simulating a road driving environment for one or more vehicles comprising or consisting of one or more processors using the inventively trained traffic agent.
  • Human driving decisions on a road can essentially be considered to comprise several abstract levels or phases forming a driving stack. Based on a particular road situation, a driver may decide to carry out a particular high-level maneuver, e.g. overtake, formulate a motion plan (also called “trajectory”) accordingly and apply control functions on actuators (throttle, brake, steer) to execute the decision.
  • a motion plan also called “trajectory”
  • Human driving decisions in natural traffic are, moreover, influenced by many factors and can be considered at various levels. For example, depending on their mental environment, human drivers, being in the same situation, may take different decisions, such as overtake, follow a car in front or change the lane.
  • Perception/Map generally relates to the input about the environment that is available to other components.
  • Traffic Rules generally relates to any component that provides legal restrictions to high-level decisions.
  • Mission Planning generally relates to a strategy on when to be where in the long- term (e.g. lane-level routing).
  • Traffic-Free Reference Line generally relates to planning an “optimal" reference-line ignoring other traffic participants.
  • Behavior Planning generally relates to planning a behavior plan, that is when exactly to conduct actions, such as lane changes, incorporating other participants.
  • Decision Post-Processing generally relates to correcting the decisions of the previous components for conforming to basic safety rules, if necessary.
  • Motion/Trajectory Planning generally relates to planning the exact future trajectory for a short time (up to 2 seconds) horizon.
  • Command Conversion generally relates to computing the final commands to send to a (real or simulated) vehicle, such as steering instructions.
  • Vehicle Dynamics/Physics generally relates to simulating the cars behavior resulting from the generated commands.
  • Position Update generally relates to computing the resulting new position of the vehicle in the simulation. Usage of these terms varies drastically in literature.
  • the model is implemented in an autonomous driving car and not as a simulated traffic agent in a simulation environment.
  • Muller presents the same approach used to train on a remote control car data and consequently automate it’s driving (see U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739-746).
  • Xu and Gao use end-to-end deep learning to map raw images from numerous on road human driving footages to both high-level actions of “stop” and “go” as-well as steering angle commands (see H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174-2182).
  • the intended behavior of the model can be approximated to cover lane following, obstacle avoidance and lane change behavior of human drivers.
  • the work provides only a distribution of the car controls e.g. steering and therefore does not claim to be highly accurate to drive a car in a simulation or real-world driving scenario.
  • the method does not model acceleration/deceleration commands - only high-level decisions of stop and go.
  • the model is not implemented in a simulated traffic agent in a simulation environment.
  • Codevilla uses the same approach to learn the image to complete longitudinal and latitudinal control commands (steering and acceleration) of a car using the CARLA driving simulation data (see F. Codevilla, M. Miiller, A. Lopez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). 1em plus 0.5em minus 0.4em IEEE, 2018, pp. 1-9).
  • the model is taught to learn roughly all aspects of the driving behavior, i.e.
  • the aim of the present invention to provide a computing system and method for simulating a road driving environment in a driving situation for one or more vehicles, so that the decision of a traffic agent reflects a human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably steering and acceleration, in a way that exhibits a naturalistic driving behavior in general and in particular for high-level decisions, such as lane changing behavior, e.g., in an overtake.
  • a human like (naturalistic) behavior i.e. controls the vehicle’s longitudinal and lateral position, preferably steering and acceleration, in a way that exhibits a naturalistic driving behavior in general and in particular for high-level decisions, such as lane changing behavior, e.g., in an overtake.
  • a first aspect of the invention relates to a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment.
  • Processing at least part of the driving data and map data of step a) into one or more respective perception frames P i [p 1 ,p 2 , ⁇ p n ] per given time frames t i , wherein each perception frame P i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, c.
  • Processing at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames C i [c 1( c 2 , ... c n ] per given time frames t u wherein each vehicle control frame C i contains longitudinal and latitudinal positions of the respective ego vehicles, d.
  • c n containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames C i , wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • processing steps b) and c) of the inventive method according to the first aspect can be conducted simultaneously or sequentially in any order.
  • a second aspect of the invention relates to a computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer-implemented training method according to the first inventive aspect.
  • a third aspect of the invention relates to a computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent was trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames containing longitudinal and latitudinal positions to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • inventive aspects of the present invention as disclosed hereinbefore can comprise any possible (sub-)combination of the preferred inventive embodiments as set out in the dependent claims or as disclosed in the following detailed description and/or in the accompanying figures, provided the resulting combination of features is reasonable to a person skilled in the art.
  • FIGs. 1a) to 1c) show schematic representations of (parts) of the E2E car control model of the inventive computer systems in training (Figs. 1a) and 1b)) and deployment (Fig. 1c)), respectively.
  • Fig. 2 shows a schematic representation of a six-vehicle-neighborhood information.
  • Fig. 3 shows a schematic representation of a semi-circular road geometry and applicable displacement vectors.
  • Fig. 4 shows a distribution graph of error/frame in ⁇ bearing against DFS ground-truth validation data in a Lane follow module according to the invention.
  • Fig. 5 shows a distribution graph of error/frame in ⁇ acceleration against DFS ground-truth validation data in a Lane follow module according to the invention.
  • Fig. 6 shows a distribution graph of lane-center deviation in the DFS real traffic data in Lane follow module.
  • Fig. 7 shows a distribution graph of lane-center deviation of Lane follow module when run within simulation.
  • Figs. 8a) and 8b) respectively show distribution graphs of relative speed versus relative distance to front car for the model test-run in simulation (Fig. 8a)) and ground- truth DFS data (Fig. 8b)).
  • the inventors of the different aspects of the present invention have found out that the computer-implemented systems and methods according to the present invention enable a traffic agent navigating a road vehicle in a simulation environment to make simulated driving decisions in high-level (e.g., lane change, overtake driving situations) and low / operational level (trajectory and motion planning), which reflect human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably bearing and acceleration, in a way that exhibits a naturalistic driving behavior in any driving situation.
  • high-level e.g., lane change, overtake driving situations
  • low / operational level trajectory and motion planning
  • the present invention successfully exhibits the naturalistic decision making behavior from the source data in the simulation environment in terms of planning, safety- procedures and traffic rule compliance.
  • the respective naturalistic driving and map data is according to the present invention processed to form one or more perception frames per given time frames containing corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry.
  • the respective naturalistic driving and map data is according to the present invention processed to form one or more respective vehicle control frames per given time frames, wherein each vehicle control frame contains longitudinal and latitudinal position of the respective ego vehicle.
  • the application of three categories of the perception frame is fundamental in order to provide an effective generalization of the inventive computer model.
  • the decision maker computer model of the simulated traffic agent is trained with the respective one or more perception frames as input to the model and with the one or more ground truth vehicle control frames as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames.
  • the inventive training procedure of the model is based on a data-driven approach, wherein the model is configured to implicitly learn from the ground truth naturalistic data.
  • the expression “an additionally or alternatively preferred embodiment’ or “an additionally or alternatively further preferred embodiment’ or “an additional or alternative way of configuring this embodiment’ means that the feature or feature combination disclosed in this preferred embodiment can be combined in addition to or alternatively to the features of the inventive subject matter including any preferred embodiment of each of the inventive aspects, provided the resulting feature combination is reasonable to a person skilled in the art.
  • the expression “configured’ shall be understood as in connection with systems and computer program components.
  • a system of one or more computers to be configured to perform particular operations or actions it means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform operations or actions.
  • one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by a data processing apparatus, cause the apparatus to perform the operations or actions.
  • a virtual traffic agent in the context of the present invention also called “traffic agent” can for example be a car, truck, bus, bike or motor bike.
  • a virtual traffic agent was trained according to the present invention that replicates human driving behavior in particular in complex driving situations of lane change, one or more trained virtual traffic agents may be injected into a simulation environment including complex driving situations.
  • Such an embodiment is preferred, as the trained traffic agents may interact, cooperate with and challenge an autonomous vehicle system controlling an autonomous vehicle under test.
  • Another advantage is, that such an embodiment is suitable to test the limits and weaknesses of the autonomous vehicle system, especially in complex driving situation scenarios that may be attributed to assertive or aggressive driving behaviors.
  • inventive systems and methods furthermore have the technical effect and benefit of providing an improvement to autonomous vehicle computing technology, as the autonomous vehicle is trained in the inventive simulation environment reflecting human-like / naturalistic driving scenarios.
  • a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment characterized in that the method comprises or consists of the following steps:
  • the driving data generally represents trajectory data of the respective ego vehicles.
  • the driving data in step a) for each of the given road vehicles comprises or consists of one or more status features of the respective ego vehicles per given time frames ti, preferably comprising or consisting of longitudinal velocity, longitudinal acceleration, and position of respective road vehicle in X, Y co-ordinates respectively per given time frames ti.
  • the map data of step a) contains corresponding road information comprising or consisting of i) lane counts of the respective road and ii) lane position in X, Y co-ordinates optionally X, Y, Z co- ordinates respectively per given time frames ft.
  • the traffic situation in step b) comprises or consists of six-vehicle-neighborhood information, wherein each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle.
  • each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle.
  • the two cars in the back of the ego vehicle center point translated to the two neighboring lanes.
  • the self-state information of the respective ego vehicles in step b) comprises or consists of longitudinal velocity, longitudinal acceleration, and its bearing with respect to the road direction ( angular deviation Ad).
  • bearing of an ego vehicle represents in the context of the present invention the orientation of the ego vehicle in relation to the global x- / y- axes.
  • the angular deviation may be defined as w here represent the bearing of the road and the ego vehicle, at any given time frame t i , respectively, wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • the bearing of the road may be substituted by the bearing of the lane and may be defined as where represent the bearing of the lane and the ego vehicle, at any given time frame t i .
  • the road geometry in step b) comprises or consists of a numerical representation of a respective lane geometry with respect to the ego vehicle, preferably wherein the numerical representation is selected from a circular or a semi-circular geometry.
  • the circular or semi-circular numerical representation of the respective lane geometry having two lane boundaries is in the form of a vector of displacements D j to each of the two lane boundaries, at any given time frame t i with wherein each entry D j is part of a sequence of displacement points to ego vehicle’s position divided on the basis of their relative bearing values to ego position, with intervals of 1 ° or more around the circular or semi-circular region in front and/or back of ego vehicle, and wherein the length n of the displacement vector D j represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.
  • the longitudinal and latitudinal positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacelleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame t i .
  • the use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.
  • the processing steps b) and c) can be executed simultaneously or sequentially in any order.
  • c n containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames C i , wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • the perception P i of the naturalistic data of step b) is used as input to the computer model, and the ground truth naturalistic data of the vehicle control frames C i in step c) is used as a label for training purposes.
  • the inventively trained traffic agent in a computer system simulating a driving environment according to the third inventive aspect does not use the naturalistic vehicle control frames of step c) and substitutes naturalistic perception frame of steps b) by simulated perception frames.
  • the decision maker of the inventive simulation computer system predicts as an action one or more vehicle control frames ⁇ i containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • the ground truth vehicle control frames C i and the predicted vehicle control frames ⁇ i may comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacceleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame t i .
  • the decision maker computer model of the traffic agent in step d) uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
  • At least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
  • the inventive training method further comprises processing the driving data of step a) for the respective ego vehicles per given time frames t i to binary corresponding ground truth situation categories of “Lane and wherein the decision maker computer model of the traffic agent in step d) comprises i) a Lane Follow neural network, ii) a Lane Change neural network and iii) a Function Classifier neural network, wherein
  • the one or more perception frames P i are respectively used as input to the Lane follow, the Lane Change and the Function Classifier neural networks,
  • the one or more ground truth vehicle control frames C i are respectively used as labels for independently training the Lane follow and the Lane Change neural networks by matching the predicted vehicle control frames ⁇ i with the respective ground truth vehicle control frames C i , and
  • the respectively applied ground truth situation categories per given time frames t i are used as labels to independently train the Function Classifier neural network to predict a corresponding situation category of “Lane follow” by matching the predicted situation category with the respective ground truth situation category wherein i is any arbitrary number such that i ⁇ [1,2,. . .n] and wherein n is the limit on driven frames.
  • the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • a computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer- implemented training method according to the first inventive aspect.
  • the training computing system of the second aspect can be configured in such a way that the traffic agent comprises separate modules so that the respective naturalistic driving data and map data can be processed in a suitable way.
  • the longitudinal and latitudinal positions of the respective ego vehicles comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration ( ⁇ accelleration) and bearing ( ⁇ bearing) values to be applied to the respective ego vehicles at time frame t i .
  • the use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.
  • the traffic agent according to the second inventive aspect may comprise module C, also called E2E Decision Maker (E2EDM) computer model, comprising one or more neural networks with end-to-end modeling.
  • E2EDM E2E Decision Maker
  • the output of modules A and B are used as input information to train the one or more E2E neural networks of module C.
  • the module C of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
  • the independent neural networks are independently trained.
  • At least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
  • module C comprises i) a Lane follow neural network (module C2) , ii) a Lane Change neural network (module C3) and iii) a Function Classifier neural network (module C1).
  • the training computing system may also comprise a module D, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames t i to binary corresponding ground truth situation categories of “Lane follow”
  • the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ⁇ [1,2, ...n ⁇ and wherein n is the limit on driven frames.
  • the inventive training computing system of the second inventive aspect is configured in such a way that
  • the one or more perception frames P i are respectively used as input information to the Lane follow (module C2), the Lane Change (module C3) and the Function Classifier (module C1) neural networks,
  • the one or more ground truth vehicle control frames C i are respectively used as labels for independently training the Lane Follow (module C2) and the Lane Change (module C3) neural networks by matching the predicted vehicle control frames ⁇ i with the respective ground truth vehicle control frames ⁇ i , and
  • the respective ground truth situations categories per given time frames t i are used as labels to independently train the Function Classifier (module C1) neural network to predict corresponding situation categories of “Lane follow” or “Lane Change” by matching the predicted situation category with the respective ground truth situation category
  • the inventive training computing system is furthermore configured in such a way that the output of the Function Classifier (module C1), i.e. the respective situation category of the ego vehicle at time frame t i initiates either the Lane Follow (module C2) or the Lane Change (module C3) neural network respectively.
  • module C1 Function Classifier
  • module C2 Lane Follow
  • module C3 Lane Change
  • An advantage of the inventive computing system for training a traffic agent is that the traffic agent is trained to predict both longitudinal and lateral positions of a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior.
  • a computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent is trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment per given time frame t i , wherein i is any arbitrary number such that i ⁇ [1,2, ...n] and wherein n is the limit on driven frames.
  • the traffic agent used in the inventive simulation computer system of the third aspect was trained according to the inventive training method prior to deployment in a simulation environment, wherein the driving environment (simulation) is expected to provide environment data for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed and control commands are generated by the inventively trained E2E decision maker and passed back to the environment for positional update.
  • a computing system is also called an integrated system.
  • the inventive simulation computer system does not use the naturalistic driving and map data, which is used for the training procedures as input information. Therefore, the inventive simulation computer system does not need to comprise a module B’ corresponding to module B of the training computer system.
  • the simulated driving data and map (environment) data of the simulated traffic agent which may be provided by module ST and/or module S2’ to the perception building module A’, is used as input information in the inventive simulation computer system to generate the respective perception frames P i per respective time frames t i in module A’.
  • module A’ is configured to generate the respective perception frames P i per respective time frames t i based on the simulation data provided by module S1’ and/or S2’.
  • the perception frames P i per respective time frames t i are used as input information for the inventive E2E decision maker computer model (module C’).
  • Module C is configured to predict one or more vehicle control frames ⁇ i containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing per respective time frames t i to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i e [1,2, . .. n] and wherein n is the limit on driven frames.
  • the decision maker computer model (module C’) of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
  • At least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
  • the decision maker computer model (module C’) of the traffic agent comprises i) a Lane Follow neural network (module C2’), ii) a Lane Change neural network (module C3’) and iii) a Function Classifier (module CT) neural network, which are configured in such a way that
  • one or more perception frames P i of the simulated vehicles per given time frame t i are respectively used as input to the Lane follow (module C2’), the Lane Change (module C3’) and the Function Classifier (module C3’) neural networks,
  • the Function Classifier (module CT) is configured to classify the one or more perception frames P i of the simulated vehicles per given time frame t i into the situation category “Lane follow” or “Lane Change” Dependent on the respective classification per given time frame t i , i.e. either class “Lane follow” or class “Lane Change”, the Function Classifier (module CT) initiates the neural network of either Lane Follow (module C2’) or Lane Change (module C3’) respectively.
  • Function Classifier module C1 ’
  • Function Classifier module C1’
  • Function Classifier module C1’
  • module C1 is configured to initiate the neural network “Lane follow” to predict the vehicle control frame ⁇ 1 containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t 1 .
  • Function Classifier classifies a perception frame P 2 at time frame t 2 with the situation category “Lane Change”
  • Function Classifier (module C1’) is configured to initiate the neural network “Lane Change” to predict the vehicle control frame ⁇ 2 containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing at time frame t 2 .
  • the output of the module C’ is provided to the simulated driving environment module (module S2’) in order to be applied to the simulated traffic agent in the simulation environment.
  • Module S2’ is configured to provide module ST with the respective changed simulated environment data comprising driving data and map data of the simulated traffic agent, so that module ST provides module A’ with a changed environment data set in order to generate the next perception frame.
  • Figure 1a shows schematic representation of the traffic agent for decision making in simulated driving situations (also called ⁇ 2E car control model”) 1, stored in the memory device, is configured to comprise one or more neural networks with end-to-end modeling and to execute the inventive computer-implemented training method.
  • the inventive computing system for training a traffic agent navigating a road vehicle in a simulation environment also comprises or consists of one or more processors, a memory device coupled to the one or more processors, which are not separately shown in Figure 1a).
  • the naturalistic driving data and map data are used as input information for module A (Perception building) and module B (vehicle control building), which are shown in Figure 1a) as combined module 11.
  • Modules A and B may alternatively be present as separate modules.
  • the output information respectively generated by modules A and B in module 11 is used as input information to train the traffic agent decision maker 12 (also called ⁇ 2E decision maker” or module C) in accordance with the inventive training method described in detail in hereinbefore.
  • the inventive traffic agent 1 comprises a combined module 11 comprising module A and module B.
  • Module A is configured to process at least part of the naturalistic driving data and map data to generate the respective perception frames Per given time frames t i , uherein each perception frame P i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. Specific example embodiments thereof are already discussed with respect to the first inventive aspect and also apply to this inventive training computing system of the second inventive aspect.
  • each ground truth vehicle control frame C i contains longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration ( ⁇ acceleration) and bearing ( ⁇ bearing) values to be applied to the respective ego vehicles per given time frames t i .
  • the use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.
  • the combined module 11 may also comprise an additional module D (not shown in Figure 1b)), which is configured to classify at least part of the perception frames P i based on the naturalistic driving data and map data into a binary situation category of either “Lane follow” or “Lane Change” per given time frames t i.
  • an additional module D (not shown in Figure 1b)
  • the module 12 (module C) of the traffic agent 1 uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
  • the independent neural networks are trained independently.
  • the output data of module 11 is used as input information.
  • At least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
  • Figure 1b shows as one example thereof, that the E2E decision maker 12 comprises i) a Lane Follow neural network 122 (module C2) , ii) a Lane Change neural network 123 (module C3) and iii) a Function Classifier neural network 121 (module C1).
  • the traffic agent 1 comprises the module D (not shown in Figure 1b)) for classifying situation categories, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames t i to binary corresponding ground truth situation categories of “Lane follow” other words, for each time frame t i and ego vehicle the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ⁇ [1,2, ...n ⁇ and wherein n is the limit on driven frames.
  • the E2E decision maker 12 of the inventive traffic agent 1 is configured in such a way that
  • the one or more perception frames P i are respectively used as input information to the Lane follow 122 (module C2), the Lane Change 123 (module C3) and the Function Classifier 121 (module C1) neural networks,
  • the one or more ground truth vehicle control frames C i are respectively used as labels for independently training the Lane Follow 122 (module C2) and the Lane Change 123 (module C3) neural networks by matching the predicted vehicle control frames ⁇ i with the respective ground truth vehicle control frames C £ , and
  • the respective ground truth situations categories per given time frames t i are used as labels to independently train the Function Classifier 121 (module C1 ) neural network to predict a corresponding situation category of by matching the predicted situation category with the respective ground truth situation category
  • the E2E decision maker 12 is furthermore configured in such a way that the output of the Function Classifier 121 (module C1), i.e. the respective situation category or of the ego vehicle at time frame t i , initiates either the Lane Follow 122 (module C2) or the Lane Change 123 (module C3) neural network respectively.
  • An advantage of the inventive computing system for training is that the traffic agent 1 is trained to predict both longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions to be applied to a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior.
  • the changes of longitudinal and latitudinal positions may be in the form of changes of acceleration and changes of bearing to be applied to the simulated vehicle at a given time frame.
  • FIG 1 c) shows a schematic representation of an inventive integrated simulation computer system 0T deploying an inventively trained traffic agent T comprising a module 11 ’ (module A’) for perception building based on the simulated environment data provided by module 2T (module ST) and a E2E decision maker model 12’ as well as one or more processors, a memory device coupled to the one or more processors (not separately shown in Figure 1c)).
  • the driving environment module S2’ (simulation) is expected to provide environment data in module ST for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed to build perceptions in module 11' and control commands are generated by the E2E decision making 12’ and passed back to the environment 22’ for positional update.
  • Module 11' is configured to generate perception frames for the respective simulated vehicle per given time frame containing information on (i) traffic situation, (ii) self-state information of the simulated vehicle and (iii) road geometry and to provide the generated perception frames as input information to the E2E decision maker module 12’ (module C’).
  • the E2E decision maker module 12’ was trained in accordance with the inventive training method.
  • the E2E decision maker module 12’ is, thus, configured to predict as an action one or more vehicle control frames containing longitudinal and latitudinal positions, more preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration and bearing to be applied to the simulated vehicle in the simulation environment.
  • the inventive simulation computer system 0T deploying the inventive traffic agent T does not use the naturalistic driving and map data, which is used for the training procedure as input information. Therefore, the inventive simulation computer system 0T does not need to comprise a module B’ corresponding to module B of the training computer system.
  • the simulated driving data and map (environment) data of the simulated traffic agent T which are provided by module 2T (module ST) to module 11' (module A’) are used as input information in the inventive simulation computer system 0T' to generate the respective perception frames P i per respective time frames t i in module 11' (module A’).
  • module 11' (module A’) is configured to generate the respective perception frames P i per respective time frames t i based on the simulation data provided by module 2T (module ST).
  • the perception frames P i per respective time frames t i generated by module 11 ’ are used as input information for the inventive E2E decision maker computer model 12’ (module C’).
  • Module 12’ (module C’) is configured to predict the longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated traffic agent per respective time frames t i .
  • the decision maker computer model 12’ (module C’) of the deployed traffic agent T uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
  • At least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
  • An example configuration of the E2E decision maker 12’ in deployment comprises the analogous configuration set up of the E2E decision maker 12 as shown in Figure 1 b). Accordingly, the respective details and preferred embodiments as discussed hereinbefore also apply.
  • the decision maker computer model 12’ (module C’) of the deployed traffic agent T comprises i) a Lane Follow neural network 122’ (module C2’), ii) a Lane Change neural network 123’ (module C3’) and iii) a Function Classifier 12T (module CT) neural network, which are configured in such a way that
  • one or more perception frames P i of the simulated vehicles per given time frame t i are respectively used as input to the Lane follow 122’ (module C2’), the Lane Change 123’ (module C3’) and the Function Classifier 12T (module C3’) neural networks,
  • the Function Classifier 12T (module CT) neural network is configured to classify the one or more perception frames P i of the simulated vehicles per given time frame t i into the situation category “Lane follow” or “Lane Change” Dependent on the respective classification per given time frame t i , i.e. either class “Lane follow” or class “Lane Change”, the Function Classifier 12T (module CT) initiates the neural network of either Lane Follow 122’ (module C2’) or Lane Change 123’ (module C3’) respectively.
  • Function Classifier 12T module CT
  • Function Classifier 121 module C1’
  • Function Classifier 121 module C1’
  • the neural network “Lane follow” 122’ is configured to initiate the neural network “Lane follow” 122’ to predict the vehicle control frame ⁇ 1 containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t 1.
  • Function Classifier 121’ (module C1’) classifies a perception frame P 2 at time frame t 2 with the situation category “Lane Change” then Function Classifier 121’ (module C1’) is configured to initiate the neural network “Lane Change” 123’ to predict the vehicle control frame C 2 the changes in longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated vehicle at time frame t 2.
  • module 12’ (module C’) is provided to the simulated driving environment 22’ (module S2’) in order to be applied to the simulated traffic agent T in the simulation environment.
  • Module 22’ (module S2’) is configured to provide module 2T (module ST) with the respectively changed simulated environment data comprising driving data and map data of the simulated traffic agent T, so that module 2T (module ST) provides module 11' (module A’) with the changed environment data in order to generate the next perception frame.
  • DFS commercial driving data DataFromSky
  • the DFS data set in particular comprised the following features: timestamp (in seconds, s), longitudinal velocity (in meter/seconds, m/s), longitudinal acceleration (in meter/square seconds, m/s 2 ), and global coordinates of respective vehicle (traffic agents) (in x-, y- co-ordinates).
  • the OpenDRIVE digital map (downloaded from http://www.opendrive.0rg/) was used as map data in the simulation to generate lane points in reference to each ego position, which were used to construct road-geometry data for the model to be used as input.
  • These lane points can be described as: for current and two adjacent lanes of a subject/ego vehicle at a time interval t i where each lane as set of coordinates X such that x n is the last point on the lane that is at a maximum distance of 400 m to ego/subject vehicle position at t i.
  • the perception frame used with respect to the present invention is divided into three categories:
  • Traffic Situation input (DFS and OpenDRIVE) data is processed to form the six- vehicle-neighbourhood information with reference to each ego/subject vehicle, where each represented vehicle in the six positions offer two pieces of information: (i) relative distance d to ego vehicle and (ii) relative speed v r to ego vehicle speed v e.
  • Figure 2 shows a schematic representation of a six-vehicle-neighborhood information at a specific time frame. As set out above, the vehicle roles are defined in a six-vehicle neighborhood according to the present invention as follows:
  • the car 311 in front of ego vehicle 3 (in the same lane).
  • Ego-state information includes longitudinal velocity, longitudinal acceleration, angular deviation and bearing of the ego vehicle with respect to the lane direction.
  • Angular deviation (Ad) is defined as: where are the global bearing/orientation of the lane and the ego vehicle, at any given time instance t i , respectively.
  • the present inventors investigate two possible approaches to model the inventive E2E decision maker using neural networks using simulation environment:
  • Lane follower module 122 which is used to control the vehicle during general lane follow scenarios
  • Lane changer module 123 which is used to control the vehicle during general lane follow scenarios
  • - Function classifier module 123 which is used to classify, in binary, whether a situation is that of lane follow or a lane change, and thus triggers one of the two corresponding models 122 or 123. Each of these sub-modules were trained independently.
  • Adaptive cruise control controlling the vehicle’s throttle/acceleration with reference to the front car.
  • Traffic-free steer control controlling the vehicle’s steering to keep the lane.
  • a branched neural network architecture split into two completely separate networks with no common set of layers for both of ⁇ acceleration and ⁇ bearing was used for trained with DFS driving data and OpenDRIVE map data.
  • the network was optimized using the following loss function: where is any arbitrary number of data samples, y acc and are ground truth label and predicted values of ⁇ acceleration and y bear and are ground truth label and predicted values of ⁇ bearing.
  • Specific mechanism was need to cater the problem of cascading error during test-tuns of the model in the simulation, which meant that minute errors in each frame added up to yield states which were rarely seen in the lane following training data, resulting in the model failing to control the steering well enough to keep the lane, eventually leading the vehicle out of the lane.
  • the corrective mechanism involved filtering the training data to increase the involvement, within each training iteration, of those situations where the vehicle was displaced on either side of the lane center and where ⁇ bearing was such that the distance to lane center was being reduced.
  • the graph in Figure 6 shows the distribution of average lane-center deviation of vehicles in the ground-truth dataset while the graph in Figure 7 shows the same distribution for the Lane follow model when run in the simulation environment.
  • the distribution of lane-center deviation appears to be higher in the DFS data, potentially as a result of the positional errors in recording of the data which has been declared to be upto «0.5 m.
  • the Lane follow module 122 shows a relatively less deviation owing to the corrective mechanism during training.
  • the Lane Changer module 123 targeted specific situations where the vehicle was expected to transition into either of the two adjecent lanes.
  • the model was expected to learn to predict ⁇ bearing and ⁇ acceleration values in a way that the higher-level decision of direction of lane change is implicitly taken at each frame wrapped into the lower-level output of ⁇ bearing and ⁇ acceleration values.
  • the long-term affect of this is the smooth transition to the implicitly decided lane. This process is catered for within the neural network itself and hence the term “implicit” decision is used.
  • the same perception vector is used as input to the model, except that the road- geometry displacement vector is calculated with reference to the center-points of the adjacent lanes, not the current lanes.
  • the network in this case is the same branched architecture described in the Lane follow model, and was optimized using the following loss function: where k e ⁇ + is any arbitrary number of data samples, y acc and y acc are ground truth label and predicted values of ⁇ acceleration and y bear and y bear are ground truth label and predicted values of ⁇ bearing. It was found through experiment that absolute mean squared error as in equation 2 was not able to converge well owing to the small values of ground truth ⁇ bearing in the data related to lane change scenarios and therefore mean absolute error was preferred, as shown in equation 3.
  • Table 1 shows the distribution of lane change directions in the ground-truth data and that shown by the Lane Change 123 model during a test run in the simulation. It can be seen that the percentages of corresponding lane changes are very similar for both sources, which can be seen as rough approximate of the similarity in behavior of the model with real human drivers in the data.
  • Table 1 Percentage of lane change directions selected by the lane change module and real traffic data.
  • the Function Classifier 121 is targeted to act as a moderator for the two main modules: Lane Follow 122 and Lane Changer 123.
  • a single fully connected neural network was used to train the model with the DFS data, optimized on the following cost function known as log loss: where is any arbitrary number of data samples, and y and are the ground truth and predicted output of the model as likelihood of the scenario being a Lane Change (and correspondingly the likelihood of Lane follow as .
  • Table 2 shows the confusion matrix for the classification of lane follow and lane change scenarios by the Functional Classifier model compared to the ground-truth real traffic data.
  • the numbers shown in the table represent the respecitvely recorded instances (total instances: 148,501 ) in the ground truth data.
  • the inventive Function Classifier 121 model predicts in 96,578 instances correctly the situation category “Lane follow” and only in 8,592 instances incorrectly the situation category “Lane Change”
  • the Function Classifier 121 model predicts in 35,778 instances correctly the situation category “Lane Change” C and only in 7,553 instances incorrectly the situation category “Lane follow”
  • the Function Classifier 121 model exhibits a precision value of 0.92, a recall value of 0.93 and therefore an appreciable F1 -score of 0.925.
  • the final E2E decision maker module trained on real traffic data (DFS), was also evaluated for safety compliance in terms of collisions with surrounding traffic vehicles.
  • the E2E decision maker module was tested in the simulation environment and presently resulted only in 4 minor collisions (front car collision at low speeds) in a 30 minute drive, with dense traffic surroundings.

Abstract

The present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling as well as a respective training computing system, and a computing system for simulating a road driving environment for one or more vehicles comprising or consisting of one or more processors using the inventively trained traffic agent.

Description

COMPUTING SYSTEM AND METHOD USING END-TO-END MODELING FORA SIMULATED TRAFFIC AGENT IN A SIMULATION ENVIRONMENT
TECHNICAL FIELD:
The present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling as well as a respective training computing system, and a computing system for simulating a road driving environment for one or more vehicles comprising or consisting of one or more processors using the inventively trained traffic agent.
PRIOR ART:
Before the driving characteristics of road vehicles are tested in reality, computer simulations of certain driving situations, such as braking, are carried out. As the prediction period is usually only up to 2 seconds, those models cannot predict complex driving situations, such as required during overtaking.
The problem of devising a system that can control a car safely in a variety of traffic situations has been studied extensively and is of obvious interest for autonomous vehicle development. The focus for this area of research is on making safe and efficient decisions under real-time constraints. The simulated safe and efficient decisions, however, may not reflect human driving decisions in natural traffic.
Human driving decisions on a road can essentially be considered to comprise several abstract levels or phases forming a driving stack. Based on a particular road situation, a driver may decide to carry out a particular high-level maneuver, e.g. overtake, formulate a motion plan (also called “trajectory”) accordingly and apply control functions on actuators (throttle, brake, steer) to execute the decision.
Thus, it becomes more and more relevant to simulate human driving decisions in natural traffic. Human driving decisions in natural traffic are, moreover, influenced by many factors and can be considered at various levels. For example, depending on their mental environment, human drivers, being in the same situation, may take different decisions, such as overtake, follow a car in front or change the lane.
Many existing models employ a hierarchical structure in the sense that more abstract decisions (such as, which route to take) are computed first and then passed “down” to different layers that deal with an increasing level of details of the driving process based on that input. The driving stack is split into several phases which aim to reflect actual relevant components to the different approaches, e.g. in the context of simulation environments, rather than the driving stuff of an autonomous vehicle.
Such phases may be considered as follows: Perception/Map generally relates to the input about the environment that is available to other components.
Traffic Rules generally relates to any component that provides legal restrictions to high-level decisions.
Mission Planning generally relates to a strategy on when to be where in the long- term (e.g. lane-level routing).
Traffic-Free Reference Line generally relates to planning an “optimal" reference-line ignoring other traffic participants.
Behavior Planning generally relates to planning a behavior plan, that is when exactly to conduct actions, such as lane changes, incorporating other participants. Decision Post-Processing generally relates to correcting the decisions of the previous components for conforming to basic safety rules, if necessary.
Motion/Trajectory Planning generally relates to planning the exact future trajectory for a short time (up to 2 seconds) horizon.
Command Conversion generally relates to computing the final commands to send to a (real or simulated) vehicle, such as steering instructions. Vehicle Dynamics/Physics generally relates to simulating the cars behavior resulting from the generated commands.
Position Update generally relates to computing the resulting new position of the vehicle in the simulation. Usage of these terms varies drastically in literature.
It can be argued that these hierarchical models have certain limitations such as not being able to make high-level decisions that can be acted upon as “lower" components like a Motion Planner (component that decides on e.g. timings of accelerations and lane- changes) might need to alter or even reject it (see Junqing Wei, Jarrod M. Snider, Tianyu Gu, John Dolan, and Bakhtiar Litkouhi. A behavioral planning framework for autonomous driving. Pages 458-464, 062014). Accordingly, the hierarchical models only offer limited realism in reflecting human driving behavior.
The power of end-to-end (synonym “e2e” or “E2E") learning using neural networks has been proven many times in various domains. In the autonomous driving industry, the e2e approach is popular in constructing robust models to various driving controls, e.g. steering, pedal control etc., in a way that maps sensory input (e.g. image pixels) directly to control output. This direct mapping relieves the need to use comprehensively labeled training data with annotated lane markings, road boundaries etc. and allows salient features to be extracted based on a goal driven learning approach. Bojarsky has shown that the decision-making processes of a human driver during lane following can be modelled in a deep neural network (see M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016). The authors attempt to map raw images from driving footage to steering commands of the car and thereby implicitly embedding the levels in the driving stack in the layers of a neural network, much like a human mind does. The model is taught to learn lane-keeping behavior of human drivers, but lane changing was not modelled. The method uses only steering commands, but no information for controlling the car’s longitudinal movement (i.e. acceleration/deceleration). The model is implemented in an autonomous driving car and not as a simulated traffic agent in a simulation environment. Muller presents the same approach used to train on a remote control car data and consequently automate it’s driving (see U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739-746).
Xu and Gao use end-to-end deep learning to map raw images from numerous on road human driving footages to both high-level actions of “stop" and “go" as-well as steering angle commands (see H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174-2182). The intended behavior of the model can be approximated to cover lane following, obstacle avoidance and lane change behavior of human drivers. The work provides only a distribution of the car controls e.g. steering and therefore does not claim to be highly accurate to drive a car in a simulation or real-world driving scenario. The method does not model acceleration/deceleration commands - only high-level decisions of stop and go. The model is not implemented in a simulated traffic agent in a simulation environment. Codevilla uses the same approach to learn the image to complete longitudinal and latitudinal control commands (steering and acceleration) of a car using the CARLA driving simulation data (see F. Codevilla, M. Miiller, A. Lopez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). 1em plus 0.5em minus 0.4em IEEE, 2018, pp. 1-9). The model is taught to learn roughly all aspects of the driving behavior, i.e. lane following, adaptive cruise-control, obstacle avoidance and lane changing. The model was evaluated in a simulation environment as traffic agent. Chen presents a similar solution in the TORCS racing-car simulation environment approach and also claims that the end-to-end model explicitly learns to focus on interpretable perception items, such as distance to lane and road boundary, distance to other cars around and angular deviation from the road as part of a more interpretable solution to modelling driving behavior (see C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2722- 2730). Both of these solutions, however, are trained on simulation data of a computer- controlled driver thereby not having the complete capacity to exhibit actual human-like behavior. In particular, the relevant prior art work mentioned above either only limit the control of the car to only latitudinal steering commands or target one specific function associated to human driving, e.g. lane follow, lane change etc. This limitation, however, does not allow to exhibit actual human-like driving behavior in simulated traffic agent in a simulation environment.
Furthermore, the prior art solutions are dependent on implicit learning of perception items such as lane-boundary positions, traffic car positions etc. from visual input (image), which leads to less accurate information of the vehicle environment.
In view of the shortcomings of the prior art, it is the aim of the present invention to provide a computing system and method for simulating a road driving environment in a driving situation for one or more vehicles, so that the decision of a traffic agent reflects a human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably steering and acceleration, in a way that exhibits a naturalistic driving behavior in general and in particular for high-level decisions, such as lane changing behavior, e.g., in an overtake.
BRIEF DESCRIPTION OF THE INVENTION:
The aforementioned aim is solved at least in part by means of the claimed inventive subject matter. Advantages (preferred embodiments) are set out in the detailed description hereinafter and/or the accompanying figures as well as in the dependent claims.
Accordingly, a first aspect of the invention relates to a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment. The method comprises or consists of the following steps: a. Providing driving data at one or more time frames ti = [t1, t2, ... tn] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and providing map data on the respective road at the given time frames ti, b. Processing at least part of the driving data and map data of step a) into one or more respective perception frames Pi = [p1,p2, ■■■ pn] per given time frames ti, wherein each perception frame Pi contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, c. Processing at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames Ci = [c1( c2, ... cn ] per given time frames tu wherein each vehicle control frame Ci contains longitudinal and latitudinal positions of the respective ego vehicles, d. Training a decision maker computer model of the traffic agent with the one or more perception frames Pi per given time frames ti of step b) as input to the model and with the one or more ground truth vehicle control frames Ci per given time frames ti of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉi = [c1, c2, ... cn] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the
Figure imgf000008_0001
respective ground truth vehicle control frames Ci, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.
The processing steps b) and c) of the inventive method according to the first aspect can be conducted simultaneously or sequentially in any order.
A second aspect of the invention relates to a computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer-implemented training method according to the first inventive aspect.
A third aspect of the invention relates to a computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent was trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames containing longitudinal and latitudinal positions to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.
The inventive aspects of the present invention as disclosed hereinbefore can comprise any possible (sub-)combination of the preferred inventive embodiments as set out in the dependent claims or as disclosed in the following detailed description and/or in the accompanying figures, provided the resulting combination of features is reasonable to a person skilled in the art.
BRIEF DESCRIPTION OF THE DRAWINGS:
Further characteristics and advantages of the present invention will ensue from the accompanying drawings, wherein Figs. 1a) to 1c) show schematic representations of (parts) of the E2E car control model of the inventive computer systems in training (Figs. 1a) and 1b)) and deployment (Fig. 1c)), respectively.
Fig. 2 shows a schematic representation of a six-vehicle-neighborhood information.
Fig. 3 shows a schematic representation of a semi-circular road geometry and applicable displacement vectors.
Fig. 4 shows a distribution graph of error/frame in Δbearing against DFS ground-truth validation data in a Lane Follow module according to the invention.
Fig. 5 shows a distribution graph of error/frame in Δacceleration against DFS ground-truth validation data in a Lane Follow module according to the invention. Fig. 6 shows a distribution graph of lane-center deviation in the DFS real traffic data in Lane Follow module.
Fig. 7 shows a distribution graph of lane-center deviation of Lane Follow module when run within simulation. Figs. 8a) and 8b) respectively show distribution graphs of relative speed versus relative distance to front car for the model test-run in simulation (Fig. 8a)) and ground- truth DFS data (Fig. 8b)). DETAILED DESCRIPTION OF THE INVENTION:
As set out in more detail hereinafter, the inventors of the different aspects of the present invention have found out that the computer-implemented systems and methods according to the present invention enable a traffic agent navigating a road vehicle in a simulation environment to make simulated driving decisions in high-level (e.g., lane change, overtake driving situations) and low / operational level (trajectory and motion planning), which reflect human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably bearing and acceleration, in a way that exhibits a naturalistic driving behavior in any driving situation.
Thus, the present invention successfully exhibits the naturalistic decision making behavior from the source data in the simulation environment in terms of planning, safety- procedures and traffic rule compliance.
The respective naturalistic driving and map data is according to the present invention processed to form one or more perception frames per given time frames containing corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. In addition the respective naturalistic driving and map data is according to the present invention processed to form one or more respective vehicle control frames per given time frames, wherein each vehicle control frame contains longitudinal and latitudinal position of the respective ego vehicle. The application of three categories of the perception frame is fundamental in order to provide an effective generalization of the inventive computer model.
According to the present invention, the decision maker computer model of the simulated traffic agent is trained with the respective one or more perception frames as input to the model and with the one or more ground truth vehicle control frames as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames. In other words, when matching the predicted vehicle control frames with the respective ground truth vehicle control frames the predicted vehicle control frames are approximated with the respective ground truth vehicle control frames. The inventive training procedure of the model is based on a data-driven approach, wherein the model is configured to implicitly learn from the ground truth naturalistic data.
In the context of the present invention, the expression “an additionally or alternatively preferred embodiment’ or “an additionally or alternatively further preferred embodiment’ or “an additional or alternative way of configuring this embodiment’ means that the feature or feature combination disclosed in this preferred embodiment can be combined in addition to or alternatively to the features of the inventive subject matter including any preferred embodiment of each of the inventive aspects, provided the resulting feature combination is reasonable to a person skilled in the art.
Further, in the context of the present invention, the expressions “ comprising ” or “ containing ” shall be understood to have a broad meaning similar to the term “ including ” and will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. This definition also applies to variations on the term “ comprising ” such as “ comprise ” and “ comprises ” as well as variations on the term “ containing ” such as “ contain ” and “contains”.
Moreover, in the context of the present invention, the expression “configured’ shall be understood as in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions, it means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by a data processing apparatus, cause the apparatus to perform the operations or actions.
To achieve the inventive subject matter, advantages and objects thereof, the present invention as disclosed in this disclosure is directed to systems and methods that make use of computer hardware and software to train a virtual traffic agent navigating through a simulation environment using reinforcement learning algorithms and techniques. A virtual traffic agent (in the context of the present invention also called “traffic agent”) can for example be a car, truck, bus, bike or motor bike. Once a virtual traffic agent was trained according to the present invention that replicates human driving behavior in particular in complex driving situations of lane change, one or more trained virtual traffic agents may be injected into a simulation environment including complex driving situations. Such an embodiment is preferred, as the trained traffic agents may interact, cooperate with and challenge an autonomous vehicle system controlling an autonomous vehicle under test. Another advantage is, that such an embodiment is suitable to test the limits and weaknesses of the autonomous vehicle system, especially in complex driving situation scenarios that may be attributed to assertive or aggressive driving behaviors.
Thus, the inventive systems and methods furthermore have the technical effect and benefit of providing an improvement to autonomous vehicle computing technology, as the autonomous vehicle is trained in the inventive simulation environment reflecting human-like / naturalistic driving scenarios.
According to the first aspect of the present invention, a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps: According to step a) the inventive training method provides driving data at one or more time frames ti = [t1, t2, ... tn] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and provides map data on the respective road at the given time frames ti wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. The driving data generally represents trajectory data of the respective ego vehicles.
In an additional or alternative preferred embodiment, the driving data in step a) for each of the given road vehicles comprises or consists of one or more status features of the respective ego vehicles per given time frames ti, preferably comprising or consisting of longitudinal velocity, longitudinal acceleration, and position of respective road vehicle in X, Y co-ordinates respectively per given time frames ti. In an additional or alternative preferred embodiment, the map data of step a) contains corresponding road information comprising or consisting of i) lane counts of the respective road and ii) lane position in X, Y co-ordinates optionally X, Y, Z co- ordinates respectively per given time frames ft.
According to step b), the inventive training method processes at least part of the driving data and map data of step a) into one or more respective perception frames Pi = [p1,p2, ... pn] per given time frames ti, wherein each perception frame Pi contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, wherein i is any arbitrary number such that i e [1,2, ...n] and wherein n is the limit on driven frames.
In an additional or alternative preferred embodiment, the traffic situation in step b) comprises or consists of six-vehicle-neighborhood information, wherein each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle. With respect to the six-vehicle-neighborhood, the vehicle roles are defined in accordance with the present invention as follows:
- The car in front of ego vehicle (in the same lane).
- The car following the ego vehicle in the back (in the same lane).
- The two cars in front of the ego vehicle’s center point translated to the two neighboring lanes.
The two cars in the back of the ego vehicle’s center point translated to the two neighboring lanes.
Each of these might or might not exist for any given time/ego vehicle combination and are reflected in the model.
In an additional or alternative preferred embodiment, the self-state information of the respective ego vehicles in step b) comprises or consists of longitudinal velocity, longitudinal acceleration, and its bearing with respect to the road direction ( angular deviation Ad). The term “ bearing ” of an ego vehicle represents in the context of the present invention the orientation of the ego vehicle in relation to the global x- / y- axes. As an example, the angular deviation may be defined as
Figure imgf000014_0002
w here
Figure imgf000014_0005
represent the bearing of the road and the ego vehicle, at any given time frame ti, respectively, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. In order to increase the accuracy, the bearing of the road may be substituted by the bearing of the lane and may be defined as
Figure imgf000014_0003
where represent the bearing of the lane and the ego vehicle, at any
Figure imgf000014_0004
given time frame ti.
In an additional or alternative preferred embodiment, the road geometry in step b) comprises or consists of a numerical representation of a respective lane geometry with respect to the ego vehicle, preferably wherein the numerical representation is selected from a circular or a semi-circular geometry.
As an example, the circular or semi-circular numerical representation of the respective lane geometry having two lane boundaries is in the form of a vector of displacements Dj to each of the two lane boundaries, at any given time frame ti with
Figure imgf000014_0001
wherein each entry Dj is part of a sequence of displacement points to ego vehicle’s position divided on the basis of their relative bearing values to ego position, with intervals of 1 ° or more around the circular or semi-circular region in front and/or back of ego vehicle, and wherein the length n of the displacement vector Dj represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.
In case of semi-circular geometry, the front region covering 180° is represented, whereas the circular geometry represents both the front and the back regions covering 360°.
According to step c), the inventive training method processes at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames Ci = [c1( c2, ... cn ] per given time frames ti, wherein each vehicle control frame Ci contains longitudinal and latitudinal positions of the respective ego vehicles, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.
In an additional or alternative preferred embodiment, the longitudinal and latitudinal positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacelleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame ti. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined. The processing steps b) and c) can be executed simultaneously or sequentially in any order.
According to step d) the inventive method trains a decision maker computer model of the traffic agent with the one or more perception frames Pi per given time frames ti of step b) as input to the model and with the one or more ground truth vehicle control frames Ci per given time frames ti of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉi = [c1( c2, ... cn] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames Ci, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.
In other words, the perception Pi of the naturalistic data of step b) is used as input to the computer model, and the ground truth naturalistic data of the vehicle control frames Ci in step c) is used as a label for training purposes. In contrast, thereto, during deployment the inventively trained traffic agent in a computer system simulating a driving environment according to the third inventive aspect does not use the naturalistic vehicle control frames of step c) and substitutes naturalistic perception frame of steps b) by simulated perception frames. During deployment, the decision maker of the inventive simulation computer system according to the third inventive aspect predicts as an action one or more vehicle control frames Ĉi containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. As already discussed above with respect to the preferred embodiment of step c), the ground truth vehicle control frames Ci and the predicted vehicle control frames Ĉi may comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacceleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame ti.
In an additional or preferred embodiment, the decision maker computer model of the traffic agent in step d) uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
Accordingly, as one embodiment thereof, the inventive training method further comprises processing the driving data of step a) for the respective ego vehicles per given time frames ti to binary corresponding ground truth situation categories of “Lane
Figure imgf000016_0001
and wherein the decision maker computer model of the traffic agent in step d) comprises i) a Lane Follow neural network, ii) a Lane Change neural network and iii) a Function Classifier neural network, wherein
- the one or more perception frames Pi are respectively used as input to the Lane Follow, the Lane Change and the Function Classifier neural networks,
- the one or more ground truth vehicle control frames Ci are respectively used as labels for independently training the Lane Follow and the Lane Change neural networks by matching the predicted vehicle control frames Ĉi with the respective ground truth vehicle control frames Ci, and
- the respectively applied ground truth situation categories
Figure imgf000016_0002
per given time frames ti are used as labels to independently train the Function Classifier neural network to predict a corresponding situation category of “Lane Follow” by matching
Figure imgf000016_0003
the predicted situation category with the respective ground truth
Figure imgf000017_0001
situation category
Figure imgf000017_0002
wherein i is any arbitrary number such that i ∈ [1,2,. . .n] and wherein n is the limit on driven frames.
In other words, when matching the predicted situation categories with
Figure imgf000017_0003
the respective ground truth situations categories the predicted situations
Figure imgf000017_0006
categories are approximated with the respective ground truth situations
Figure imgf000017_0004
categories In other words, for each time frame ti and ego vehicle the
Figure imgf000017_0005
inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.
All features and embodiments disclosed with respect to the first aspect of the present invention are combinable alone or in (sub-)combination with the second aspect or third aspect of the present invention including each of the preferred embodiments thereof, provided the resulting combination of features is reasonable to a person skilled in the art.
According to the second aspect of the invention a computing system for training a traffic agent navigating a road vehicle in a simulation environment is provided comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer- implemented training method according to the first inventive aspect.
According to an additional or alternative preferred embodiment, the training computing system of the second aspect can be configured in such a way that the traffic agent comprises separate modules so that the respective naturalistic driving data and map data can be processed in a suitable way. In particular, the traffic agent may comprise a module A for processing at least part of the naturalistic driving data and map data to generate the respective perception frames Pi = [p1,p2, - pn] Per given time frames ti, wherein each perception frame Pi contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. Specific embodiments thereof are already discussed with respect to the first inventive aspect and also apply to this inventive training computing system of the second inventive aspect. In addition, the traffic agent may comprise a module B for processing at least part of the naturalistic driving data and map data to generate one or more respective ground truth vehicle control frames Ci = [c1( c2, ... cn ] per given time frames tu wherein each vehicle control frame Ci contains longitudinal and latitudinal positions of the respective ego vehicles. In an additional or alternative preferred embodiment, the longitudinal and latitudinal positions of the respective ego vehicles comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Δaccelleration) and bearing (Δbearing) values to be applied to the respective ego vehicles at time frame ti. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.
Furthermore, the traffic agent according to the second inventive aspect may comprise module C, also called E2E Decision Maker (E2EDM) computer model, comprising one or more neural networks with end-to-end modeling. The output of modules A and B are used as input information to train the one or more E2E neural networks of module C.
In an additional or preferred embodiment, the module C of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture. Preferably, the independent neural networks are independently trained.
In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
Accordingly, as one embodiment thereof, module C comprises i) a Lane Follow neural network (module C2) , ii) a Lane Change neural network (module C3) and iii) a Function Classifier neural network (module C1). In this case, the training computing system may also comprise a module D, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames ti to binary corresponding ground truth situation categories of “Lane Follow”
Figure imgf000019_0002
In other words, for
Figure imgf000019_0003
each time frame ti and ego vehicle the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n\ and wherein n is the limit on driven frames.
With respect to the training procedure of the first inventive concept, the inventive training computing system of the second inventive aspect is configured in such a way that
- the one or more perception frames Pi are respectively used as input information to the Lane Follow (module C2), the Lane Change (module C3) and the Function Classifier (module C1) neural networks,
- the one or more ground truth vehicle control frames Ci are respectively used as labels for independently training the Lane Follow (module C2) and the Lane Change (module C3) neural networks by matching the predicted vehicle control frames Ĉi with the respective ground truth vehicle control frames Ĉi, and
- the respective ground truth situations categories
Figure imgf000019_0004
per given time frames ti are used as labels to independently train the Function Classifier (module C1) neural network to predict corresponding situation categories of “Lane Follow”
Figure imgf000019_0005
or “Lane Change”
Figure imgf000019_0006
Figure imgf000019_0001
by matching the predicted situation category
Figure imgf000019_0007
with the respective ground truth situation category
Figure imgf000019_0008
The inventive training computing system is furthermore configured in such a way that the output of the Function Classifier (module C1), i.e. the respective situation category of the ego vehicle at time frame ti initiates either the Lane Follow (module C2) or the Lane Change (module C3) neural network respectively.
An advantage of the inventive computing system for training a traffic agent is that the traffic agent is trained to predict both longitudinal and lateral positions of a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior. All features and embodiments disclosed with respect to the second aspect of the present invention are combinable alone or in (sub-)combination with the first aspect or second aspect of the present invention including each of the preferred embodiments thereof, provided the resulting combination of features is reasonable to a person skilled in the art.
According to the third aspect of the invention a computing system for simulating a road driving environment in driving situations for one or more vehicles is provided comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent is trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames
Figure imgf000020_0001
containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment per given time frame ti, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. In other words, the traffic agent used in the inventive simulation computer system of the third aspect was trained according to the inventive training method prior to deployment in a simulation environment, wherein the driving environment (simulation) is expected to provide environment data for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed and control commands are generated by the inventively trained E2E decision maker and passed back to the environment for positional update. Such a computing system is also called an integrated system. As already mentioned above, the inventive simulation computer system does not use the naturalistic driving and map data, which is used for the training procedures as input information. Therefore, the inventive simulation computer system does not need to comprise a module B’ corresponding to module B of the training computer system. In contrast, the simulated driving data and map (environment) data of the simulated traffic agent, which may be provided by module ST and/or module S2’ to the perception building module A’, is used as input information in the inventive simulation computer system to generate the respective perception frames Pi per respective time frames ti in module A’. In other words, module A’ is configured to generate the respective perception frames Pi per respective time frames ti based on the simulation data provided by module S1’ and/or S2’. The perception frames Pi per respective time frames ti are used as input information for the inventive E2E decision maker computer model (module C’). Module C’ is configured to predict one or more vehicle control frames Ĉi containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing per respective time frames tito be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i e [1,2, . .. n] and wherein n is the limit on driven frames.
In an additional or preferred embodiment, the decision maker computer model (module C’) of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
Accordingly, as one embodiment thereof, the decision maker computer model (module C’) of the traffic agent comprises i) a Lane Follow neural network (module C2’), ii) a Lane Change neural network (module C3’) and iii) a Function Classifier (module CT) neural network, which are configured in such a way that
- one or more perception frames Pi of the simulated vehicles per given time frame ti are respectively used as input to the Lane Follow (module C2’), the Lane Change (module C3’) and the Function Classifier (module C3’) neural networks,
- the Function Classifier (module CT) is configured to classify the one or more perception frames Pi of the simulated vehicles per given time frame ti into the situation category “Lane Follow” or “Lane Change”
Figure imgf000021_0001
Dependent on the respective classification per
Figure imgf000021_0002
given time frame ti, i.e. either class “Lane Follow” or class “Lane Change”, the Function Classifier (module CT) initiates the neural network of either Lane Follow (module C2’) or Lane Change (module C3’) respectively. As an example, in case the Function Classifier (module C1 ’) classifies a perception frame P1 at time frame t1 with the situation category “Lane Follow”
Figure imgf000022_0002
then Function Classifier (module C1’) is configured to initiate the neural network “Lane Follow” to predict the vehicle control frame Ĉ1 containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t1. Alternatively, in case the Function Classifier (module C1’) classifies a perception frame P2 at time frame t2 with the situation category “Lane Change”
Figure imgf000022_0001
then Function Classifier (module C1’) is configured to initiate the neural network “Lane Change” to predict the vehicle control frame Ĉ2 containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing at time frame t2. The output of the module C’ is provided to the simulated driving environment module (module S2’) in order to be applied to the simulated traffic agent in the simulation environment. Module S2’ is configured to provide module ST with the respective changed simulated environment data comprising driving data and map data of the simulated traffic agent, so that module ST provides module A’ with a changed environment data set in order to generate the next perception frame.
The present invention is described in the following on the basis of exemplary embodiments, which merely serve as examples and which shall not limit the scope of the present protective right.
DETAILED DESCRIPTION OF FIGURES Further characteristics and advantages of the present invention will ensue from the following description of example embodiments of the inventive aspects with reference to the accompanying figures.
All of the features disclosed hereinafter with respect to the example embodiments and / or the accompanying figures can alone or in any sub-combination be combined with features of the two aspects of the present invention including features of preferred embodiments thereof, provided the resulting feature combination is reasonable to a person skilled in the art.
Figure 1a) shows schematic representation of the traffic agent for decision making in simulated driving situations (also called Έ2E car control model”) 1, stored in the memory device, is configured to comprise one or more neural networks with end-to-end modeling and to execute the inventive computer-implemented training method. The inventive computing system for training a traffic agent navigating a road vehicle in a simulation environment also comprises or consists of one or more processors, a memory device coupled to the one or more processors, which are not separately shown in Figure 1a).
According to Figure 1a), the naturalistic driving data and map data (not separately shown) are used as input information for module A (Perception building) and module B (vehicle control building), which are shown in Figure 1a) as combined module 11. Modules A and B may alternatively be present as separate modules. The output information respectively generated by modules A and B in module 11 is used as input information to train the traffic agent decision maker 12 (also called Έ2E decision maker” or module C) in accordance with the inventive training method described in detail in hereinbefore.
As an example, the inventive traffic agent 1 comprises a combined module 11 comprising module A and module B. Module A is configured to process at least part of the naturalistic driving data and map data to generate the respective perception frames Per given time frames ti, uherein each perception frame Pi contains
Figure imgf000023_0001
corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. Specific example embodiments thereof are already discussed with respect to the first inventive aspect and also apply to this inventive training computing system of the second inventive aspect. Module B is configured to process at least part of the naturalistic driving data and map data to generate one or more respective ground truth vehicle control frames Ci = [c1( c2, ... cn ] per given time frames ti, wherein each ground truth vehicle control frame Ci contains longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration (Δacceleration) and bearing (Δbearing) values to be applied to the respective ego vehicles per given time frames ti. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.
In addition, the combined module 11 may also comprise an additional module D (not shown in Figure 1b)), which is configured to classify at least part of the perception frames Pi based on the naturalistic driving data and map data into a binary situation category of either “Lane Follow” or “Lane Change” per given time frames ti.
According to an alternative or an additional preferred embodiment, the module 12 (module C) of the traffic agent 1 uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture. Preferably, the independent neural networks are trained independently. One example of such a configuration, wherein the E2E decision maker 12 comprises three neural networks 121 (Function Classifier), 122 (Lane Follow) and 123 (Lane Change) combined in a branched architecture, is shown in Figure 1b). As input information, the output data of module 11 is used. In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
Figure 1b) shows as one example thereof, that the E2E decision maker 12 comprises i) a Lane Follow neural network 122 (module C2) , ii) a Lane Change neural network 123 (module C3) and iii) a Function Classifier neural network 121 (module C1). In this case, the traffic agent 1 comprises the module D (not shown in Figure 1b)) for classifying situation categories, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames ti to binary corresponding ground truth situation categories of “Lane Follow”
Figure imgf000024_0002
Figure imgf000024_0001
other words, for each time frame ti and ego vehicle the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n\ and wherein n is the limit on driven frames. According to this example, the E2E decision maker 12 of the inventive traffic agent 1 is configured in such a way that
- the one or more perception frames Pi are respectively used as input information to the Lane Follow 122 (module C2), the Lane Change 123 (module C3) and the Function Classifier 121 (module C1) neural networks,
- the one or more ground truth vehicle control frames Ci are respectively used as labels for independently training the Lane Follow 122 (module C2) and the Lane Change 123 (module C3) neural networks by matching the predicted vehicle control frames Ĉi with the respective ground truth vehicle control frames C£, and
- the respective ground truth situations categories per given time
Figure imgf000025_0006
frames ti are used as labels to independently train the Function Classifier 121 (module C1 ) neural network to predict a corresponding situation category of
Figure imgf000025_0001
Figure imgf000025_0002
by matching the predicted situation category
Figure imgf000025_0003
with the respective ground truth situation category
Figure imgf000025_0004
The E2E decision maker 12 is furthermore configured in such a way that the output of the Function Classifier 121 (module C1), i.e. the respective situation category or
Figure imgf000025_0005
of the ego vehicle at time frame ti, initiates either the Lane Follow 122 (module C2) or the Lane Change 123 (module C3) neural network respectively.
An advantage of the inventive computing system for training is that the traffic agent 1 is trained to predict both longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions to be applied to a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior. According to one example, the changes of longitudinal and latitudinal positions may be in the form of changes of acceleration and changes of bearing to be applied to the simulated vehicle at a given time frame.
Figure 1 c) shows a schematic representation of an inventive integrated simulation computer system 0T deploying an inventively trained traffic agent T comprising a module 11 ’ (module A’) for perception building based on the simulated environment data provided by module 2T (module ST) and a E2E decision maker model 12’ as well as one or more processors, a memory device coupled to the one or more processors (not separately shown in Figure 1c)). The driving environment module S2’ (simulation) is expected to provide environment data in module ST for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed to build perceptions in module 11' and control commands are generated by the E2E decision making 12’ and passed back to the environment 22’ for positional update.
Module 11' is configured to generate perception frames for the respective simulated vehicle per given time frame containing information on (i) traffic situation, (ii) self-state information of the simulated vehicle and (iii) road geometry and to provide the generated perception frames as input information to the E2E decision maker module 12’ (module C’). The E2E decision maker module 12’ was trained in accordance with the inventive training method. The E2E decision maker module 12’ is, thus, configured to predict as an action one or more vehicle control frames
Figure imgf000026_0001
containing longitudinal and latitudinal positions, more preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration and bearing to be applied to the simulated vehicle in the simulation environment.
As already mentioned above, the inventive simulation computer system 0T deploying the inventive traffic agent T does not use the naturalistic driving and map data, which is used for the training procedure as input information. Therefore, the inventive simulation computer system 0T does not need to comprise a module B’ corresponding to module B of the training computer system. In contrast, the simulated driving data and map (environment) data of the simulated traffic agent T, which are provided by module 2T (module ST) to module 11' (module A’) are used as input information in the inventive simulation computer system 0T' to generate the respective perception frames Pi per respective time frames ti in module 11' (module A’). In other words, module 11' (module A’) is configured to generate the respective perception frames Pi per respective time frames ti based on the simulation data provided by module 2T (module ST). The perception frames Pi per respective time frames ti generated by module 11 ’ are used as input information for the inventive E2E decision maker computer model 12’ (module C’). Module 12’ (module C’) is configured to predict the longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated traffic agent per respective time frames ti. In an additional or preferred embodiment, the decision maker computer model 12’ (module C’) of the deployed traffic agent T uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
An example configuration of the E2E decision maker 12’ in deployment comprises the analogous configuration set up of the E2E decision maker 12 as shown in Figure 1 b). Accordingly, the respective details and preferred embodiments as discussed hereinbefore also apply.
Thus, the decision maker computer model 12’ (module C’) of the deployed traffic agent T comprises i) a Lane Follow neural network 122’ (module C2’), ii) a Lane Change neural network 123’ (module C3’) and iii) a Function Classifier 12T (module CT) neural network, which are configured in such a way that
- one or more perception frames Pi of the simulated vehicles per given time frame ti are respectively used as input to the Lane Follow 122’ (module C2’), the Lane Change 123’ (module C3’) and the Function Classifier 12T (module C3’) neural networks,
- the Function Classifier 12T (module CT) neural network is configured to classify the one or more perception frames Pi of the simulated vehicles per given time frame ti into the situation category “Lane Follow” or
Figure imgf000027_0001
“Lane Change” Dependent on the respective
Figure imgf000027_0002
classification per given time frame ti, i.e. either class “Lane Follow” or class “Lane Change”, the Function Classifier 12T (module CT) initiates the neural network of either Lane Follow 122’ (module C2’) or Lane Change 123’ (module C3’) respectively. As an example, in case the Function Classifier 12T (module CT) classifies a perception frame P1 at time frame t1 with the situation category “Lane Follow”
Figure imgf000028_0001
then Function Classifier 121’ (module C1’) is configured to initiate the neural network “Lane Follow” 122’ to predict the vehicle control frame Ĉ1 containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t1. Alternatively, in case the Function Classifier 121’ (module C1’) classifies a perception frame P2 at time frame t2 with the situation category “Lane Change” then Function Classifier 121’ (module C1’) is configured to initiate the neural
Figure imgf000028_0002
network “Lane Change” 123’ to predict the vehicle control frame C2 the changes in longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated vehicle at time frame t2.
The output information of module 12’ (module C’) is provided to the simulated driving environment 22’ (module S2’) in order to be applied to the simulated traffic agent T in the simulation environment. Module 22’ (module S2’) is configured to provide module 2T (module ST) with the respectively changed simulated environment data comprising driving data and map data of the simulated traffic agent T, so that module 2T (module ST) provides module 11' (module A’) with the changed environment data in order to generate the next perception frame.
Experimental Part
For training purposes in accordance with the present invention, the inventors used commercial driving data DataFromSky (DFS, purchased from RCE systems s.r.o., Czech Republic) comprising driving data of vehicles driven by humans for a duration of six hours on part (500 m) of the highway A9 in Germany. The DFS data set in particular comprised the following features: timestamp (in seconds, s), longitudinal velocity (in meter/seconds, m/s), longitudinal acceleration (in meter/square seconds, m/s2), and global coordinates of respective vehicle (traffic agents) (in x-, y- co-ordinates).
Furthermore, the The OpenDRIVE digital map (downloaded from http://www.opendrive.0rg/) was used as map data in the simulation to generate lane points in reference to each ego position, which were used to construct road-geometry data for the model to be used as input. These lane points can be described as:
Figure imgf000029_0001
for current and two adjacent lanes of a subject/ego vehicle at a time interval ti where each lane as set of coordinates X such that x
Figure imgf000029_0002
n is the
Figure imgf000029_0003
last point on the lane that is at a maximum distance of 400 m to ego/subject vehicle
Figure imgf000029_0004
position at ti.
As already set out in the detailed description above, the perception frame used with respect to the present invention is divided into three categories:
1. Traffic Situation: input (DFS and OpenDRIVE) data is processed to form the six- vehicle-neighbourhood information with reference to each ego/subject vehicle, where each represented vehicle in the six positions offer two pieces of information: (i) relative distance d to ego vehicle and (ii) relative speed vr to ego vehicle speed ve. Figure 2 shows a schematic representation of a six-vehicle-neighborhood information at a specific time frame. As set out above, the vehicle roles are defined in a six-vehicle neighborhood according to the present invention as follows:
The car 311 in front of ego vehicle 3 (in the same lane).
The car 312 following the ego vehicle 3 in the back (in the same lane).
The two cars 321 , 322 in front of the ego vehicle’s 3 center point translated to the two neighboring lanes.
The two cars 331 , 332 in the back of the ego vehicle’s 3 center point translated to the two neighboring lanes.
According to Figure 2, all of the positions exist at the represented time frame. All of the six vehicles 311 , 312, 321 , 322, 331 and 332 in the neighborhood of the ego vehicle 3 have the same distance d to ego vehicle 3. The relative speed vr is the respective speed vn of any of the six neighborhood vehicles 311, 312, 321, 322, 331 and 332 minus the speed ve of the ego vehicle 3 (assuming the vehicles are moving in the same direction). 2. Ego-state information: includes longitudinal velocity, longitudinal acceleration, angular deviation and bearing of the ego vehicle with respect to the lane direction. As mentioned above, Angular deviation (Ad) is defined as:
Figure imgf000030_0001
where
Figure imgf000030_0002
are the global bearing/orientation of the lane and the ego vehicle, at any given time instance ti, respectively.
3. Road Geometry: According to an the present example experiment, DFS driving data and OpenDRIVE map data is processed to yield a semi-circular numerical representation of road geometry in the form of a vector of displacements, Dj, to each of the two lane boundaries LB 1 and LB 2, at any time instance ti such that:
Figure imgf000030_0003
where each entry Dj is part of a sequence of displacement points to ego position divided on the basis of their relative bearing values to ego position, with intervals of 5° around the semi-circular region in front of ego. Therefore the length n of the displacement vector
Figure imgf000030_0004
in the experimentation scope of this paper has been set to: n = 180/5 = 36
Such a semi-circular road geometry is schematically represented in Figure 3, wherein - for the sake of clarity - only part of the 36 displacement vectors Dj of the ego vehicle 3 at this instance are illustrated.
The present inventors investigate two possible approaches to model the inventive E2E decision maker using neural networks using simulation environment:
• Single network model, which involves the decision making processes to be learned in a single neural network with a sequence of n layers, where
Figure imgf000030_0005
empirically deduced during experimentation.
• Functionally branched networks, which involve the decision making processes to be divided on the basis of fundamental driving functions e.g. lane following and lane changing. Three neural networks are used to model this approach, each of which can be defined to comprise a sequence of n layers, where
Figure imgf000031_0002
, empirically deduced during experimentation. These are as follows:
- Lane follower module 122, which is used to control the vehicle during general lane follow scenarios, - Lane changer module 123, which is used to control the vehicle during general lane follow scenarios,
- Function classifier module 123, which is used to classify, in binary, whether a situation is that of lane follow or a lane change, and thus triggers one of the two corresponding models 122 or 123. Each of these sub-modules were trained independently.
In the present experimentation, the Lane Following module 122 targets two abstract sets of scenarios:
• Adaptive cruise control (ACC): controlling the vehicle’s throttle/acceleration with reference to the front car. · Traffic-free steer control: controlling the vehicle’s steering to keep the lane.
A branched neural network architecture split into two completely separate networks with no common set of layers for both of Δacceleration and Δbearing was used for trained with DFS driving data and OpenDRIVE map data. The network was optimized using the following loss function:
Figure imgf000031_0001
where is any arbitrary number of data samples, yacc and are ground
Figure imgf000031_0003
truth label and predicted values of Δacceleration and ybear and
Figure imgf000031_0004
are ground truth label and predicted values of Δbearing. Specific mechanism was need to cater the problem of cascading error during test-tuns of the model in the simulation, which meant that minute errors in each frame added up to yield states which were rarely seen in the lane following training data, resulting in the model failing to control the steering well enough to keep the lane, eventually leading the vehicle out of the lane. The corrective mechanism involved filtering the training data to increase the involvement, within each training iteration, of those situations where the vehicle was displaced on either side of the lane center and where Δbearing was such that the distance to lane center was being reduced.
The experimentation results for the Lane Follow module 122 independently trained on real traffic data DFS are shown in Figures 4 to 7. The respective graphs in Figures 4 and 5 show the error per frame of the Lane Follow module 122 with respect to the ground- truth dataset for Δbearing and Δacceleration values respectively.
The graph in Figure 6 shows the distribution of average lane-center deviation of vehicles in the ground-truth dataset while the graph in Figure 7 shows the same distribution for the Lane Follow model when run in the simulation environment. The distribution of lane-center deviation appears to be higher in the DFS data, potentially as a result of the positional errors in recording of the data which has been declared to be upto «0.5 m. The Lane Follow module 122, however, shows a relatively less deviation owing to the corrective mechanism during training.
The Lane Changer module 123 targeted specific situations where the vehicle was expected to transition into either of the two adjecent lanes. The model was expected to learn to predict Δbearing and Δacceleration values in a way that the higher-level decision of direction of lane change is implicitly taken at each frame wrapped into the lower-level output of Δbearing and Δacceleration values. The long-term affect of this is the smooth transition to the implicitly decided lane. This process is catered for within the neural network itself and hence the term “implicit” decision is used.
The same perception vector is used as input to the model, except that the road- geometry displacement vector is calculated with reference to the center-points of the adjacent lanes, not the current lanes. The network in this case, is the same branched architecture described in the Lane Follow model, and was optimized using the following loss function:
Figure imgf000032_0001
where k e Ί+ is any arbitrary number of data samples, yacc and yacc are ground truth label and predicted values of Δacceleration and ybear and ybear are ground truth label and predicted values of Δbearing. It was found through experiment that absolute mean squared error as in equation 2 was not able to converge well owing to the small values of ground truth Δbearing in the data related to lane change scenarios and therefore mean absolute error was preferred, as shown in equation 3.
Table 1 below shows the distribution of lane change directions in the ground-truth data and that shown by the Lane Change 123 model during a test run in the simulation. It can be seen that the percentages of corresponding lane changes are very similar for both sources, which can be seen as rough approximate of the similarity in behavior of the model with real human drivers in the data.
Figure imgf000033_0007
Table 1 : Percentage of lane change directions selected by the lane change module and real traffic data.
The Function Classifier 121 is targeted to act as a moderator for the two main modules: Lane Follow 122 and Lane Changer 123. A subset of the perception is used as input to this model in the form of the (i) traffic situation and (ii) ego-state information to predict the single value likelihood of the scenario
Figure imgf000033_0002
being a lane change P(X = LaneChange ) or the scenario being a lane follow 1 - P(X = LaneChange ) enhanced
Figure imgf000033_0003
by the direction of lane change. A single fully connected neural network was used to train the model with the DFS data, optimized on the following cost function known as log loss:
Figure imgf000033_0001
where
Figure imgf000033_0004
is any arbitrary number of data samples, and y and
Figure imgf000033_0005
are the ground truth and predicted output of the model as likelihood of the scenario being a Lane Change (and correspondingly the likelihood of Lane Follow as
Figure imgf000033_0006
.
Table 2 below shows the confusion matrix for the classification of lane follow and lane change scenarios by the Functional Classifier model compared to the ground-truth real traffic data. The numbers shown in the table represent the respecitvely recorded instances (total instances: 148,501 ) in the ground truth data. For the ground truth labe “Lane Follow”, the inventive Function Classifier 121 model predicts in 96,578 instances correctly the situation category “Lane Follow” and only in 8,592 instances incorrectly the situation category “Lane Change”
Figure imgf000034_0001
For the ground truth label “Lane Change”, the Function Classifier 121 model predicts in 35,778 instances correctly the situation category “Lane Change”
Figure imgf000034_0002
C and only in 7,553 instances incorrectly the situation category “Lane Follow” Thus, the Function Classifier 121 model exhibits a precision
Figure imgf000034_0003
value of 0.92, a recall value of 0.93 and therefore an appreciable F1 -score of 0.925.
Figure imgf000034_0004
Table 2: Confusion Matrix for function classification compared to ground-truth data
It was found that the general behavior of the E2E decision maker model, when run in the simulation environment, exhibited similarity with that of the human drivers found in the real traffic data of DFS. The graphs in Figures 8a) and 8b) show the behavioral trend in maintaining the speed and distance to the front traffic vehicle during lane following scenarios for both the model test-run in simulation and ground-truth data respectively. The behavioral trend of the data provided by the traffic agent trained according to the present invention is similar to the behavioral trend of the naturalistic DFS data.
The final E2E decision maker module, trained on real traffic data (DFS), was also evaluated for safety compliance in terms of collisions with surrounding traffic vehicles. The E2E decision maker module was tested in the simulation environment and presently resulted only in 4 minor collisions (front car collision at low speeds) in a 30 minute drive, with dense traffic surroundings.

Claims

Claims:
1. A computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps: a. Providing driving data at one or more time frames ti = [t1, t2, ... tn] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and providing map data on the respective road at the given time frames ti, b. Processing at least part of the driving data and map data of step a) into one or more respective perception frames Pi = [p1,p2, ■■■ pn] per given time frames ti, wherein each perception frame Pi contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, c. Processing at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames Ci = [c1, c2, ... cn] per given time frames ti, wherein each vehicle control frame Ci contains longitudinal and latitudinal positions of the respective ego vehicles, d. Training a decision maker computer model of the traffic agent with the one or more perception frames Pi per given time frames ti of step b) as input to the model and with the one or more ground truth vehicle control frames Ci per given time frames ti of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉi = [c1( c2, ... cn] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames
Figure imgf000035_0001
with the respective ground truth vehicle control frames Ci, wherein i is any arbitrary number such that i ∈ [1,2, . . . n] and wherein n is the limit on driven frames.
2. Training method according to any claim 1, further comprising processing the driving data of step a) for the respective ego vehicles per given time frames ti to binary corresponding ground truth situation categories of “Lane Follow”
Figure imgf000036_0001
and wherein the decision maker computer model of the traffic agent in step d) comprises i) a Lane Follow neural network, ii) a Lane Change neural network and iii) a Function Classifier neural network, wherein - the one or more perception frames Pi are respectively used as input to the Lane Follow, the Lane Change and the Function Classifier neural networks,
- the one or more ground truth vehicle control frames Ci are respectively used as labels for independently training the Lane Follow and the Lane Change neural networks by matching the predicted vehicle control frames Ĉi with the respective ground truth vehicle control frames Ci, and
- the respectively applied ground truth situation categories
Figure imgf000036_0002
per given time frames ti are used as labels to independently train the Function Classifier neural network to predict a corresponding situation category of “Lane Follow” or “Lane Change”
Figure imgf000036_0003
by matching the predicted situation
Figure imgf000036_0004
category S
Figure imgf000036_0005
with the respective ground truth situation category , wherein i is any arbitrary number such that i ∈ [1,2, ...n]
Figure imgf000036_0006
and wherein n is the limit on driven frames.
3. T raining method according claim 1 or 2, wherein the driving data in step a) for each of the given road vehicles comprises or consists of one or more status features of the respective ego vehicles per given time frames ti, preferably comprising or consisting of longitudinal velocity, longitudinal acceleration, and position of respective road vehicle in X, Y co-ordinates respectively per given time frames ti.
4. Training method according to any one of claims 1 to 3, wherein the map data of step a) contains corresponding road information comprising or consisting of i) lane counts of the respective road and ii) lane position in X, Y co-ordinates respectively per given time frames ti.
5. Training method according to any one of claims 1 to 4, wherein the traffic situation in step b) comprises or consists of six-vehicle-neighborhood information, wherein each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle.
6. Training method according to any one of claims 1 to 5, wherein the self-state information of the respective ego vehicles in step b) comprises or consists of longitudinal velocity, longitudinal acceleration, and its bearing with respect to the road direction (angular deviation Ad).
7. Training method according to claim 6, wherein the angular deviation Ad is defined as
Figure imgf000037_0002
where
Figure imgf000037_0003
represent the bearing of the road and the ego vehicle, at any given time frame ti, respectively, wherein i is any arbitrary number such that i ∈ [1,2, .. . n] and wherein n is the limit on driven frames.
8. Training method according to any one of claims 1 to 7, wherein the road geometry in step b) comprises or consists of a numerical representation of a respective lane geometry with respect to the ego vehicle, preferably wherein the numerical representation is selected from a circular or a semi-circular geometry.
9. Training method according to claim 8, wherein the circular or semi-circular numerical representation of the respective lane geometry having two lane boundaries is in the form of a vector of displacements Dj to each of the two lane boundaries, at any given time frame ti with
Figure imgf000037_0001
wherein each entry Dj is part of a sequence of displacement points to ego vehicle position divided on the basis of their relative bearing values to ego position, with intervals of 1 ° or more around the circular or semi-circular region in front and/or back of ego vehicle, and wherein the length n of the displacement vector Dj represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.
10. Training method according to any one of claims 1 to 9, wherein the longitudinal and latitudinal positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration and bearing values to be applied to the respective ego vehicles at time frame ti.
11. Computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer-implemented training method according to any one of claims 1 to 10.
12. Computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent was trained according to the computer- implemented training method according to any one of claims 1 to 10 to predict as an action one or more vehicle control frames Ĉi containing longitudinal and latitudinal positions to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ... n] and wherein n is the limit on driven frames.
13. Computing system for training a traffic agent according to claim 10 or 11 or for simulating a road driving environment in a driving situation according to claim 12, wherein three or more neural networks are used, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.
14. Computing system according to claim 13, wherein at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.
PCT/EP2020/053817 2020-02-13 2020-02-13 Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment WO2021160273A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE112020006532.4T DE112020006532T5 (en) 2020-02-13 2020-02-13 COMPUTER SYSTEM AND METHOD WITH END-TO-END MODELING FOR A SIMULATED TRAFFIC AGENT IN A SIMULATION ENVIRONMENT
PCT/EP2020/053817 WO2021160273A1 (en) 2020-02-13 2020-02-13 Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/053817 WO2021160273A1 (en) 2020-02-13 2020-02-13 Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment

Publications (1)

Publication Number Publication Date
WO2021160273A1 true WO2021160273A1 (en) 2021-08-19

Family

ID=69591645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/053817 WO2021160273A1 (en) 2020-02-13 2020-02-13 Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment

Country Status (2)

Country Link
DE (1) DE112020006532T5 (en)
WO (1) WO2021160273A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049167A (en) * 2022-08-16 2022-09-13 北京市城市规划设计研究院 Traffic situation prediction method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018084324A1 (en) * 2016-11-03 2018-05-11 Mitsubishi Electric Corporation Method and system for controlling vehicle
US20200004255A1 (en) * 2018-06-29 2020-01-02 Zenuity Ab Method and arrangement for generating control commands for an autonomous road vehicle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018084324A1 (en) * 2016-11-03 2018-05-11 Mitsubishi Electric Corporation Method and system for controlling vehicle
US20200004255A1 (en) * 2018-06-29 2020-01-02 Zenuity Ab Method and arrangement for generating control commands for an autonomous road vehicle

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
C. CHENA. SEFFA. KORNHAUSERJ. XIAO: "Deepdriving: Learning affordance for direct perception in autonomous driving", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2015, pages 2722 - 2730, XP032866617, DOI: 10.1109/ICCV.2015.312
F. CODEVILLAM. MIILLERA. LOPEZV. KOLTUNA. DOSOVITSKIY: "End-to-end driving via conditional imitation learning", 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, pages 1 - 9
H. XUY. GAOF. YUT. DARRELL: "End-to-end learning of driving models from large-scale video datasets", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2017, pages 2174 - 2182
JUNQING WEIJARROD M. SNIDERTIANYU GUJOHN DOLANBAKHTIAR LITKOUHI, A BEHAVIORAL PLANNING FRAMEWORK FOR AUTONOMOUS DRIVING, June 2014 (2014-06-01), pages 458 - 464
LU CHI ET AL: "Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 12 August 2017 (2017-08-12), XP080952610 *
M. BOJARSKID. DEL TESTAD. DWORAKOWSKIB. FIRNERB. FLEPPP. GOYALL. D. JACKELM. MONFORTU. MULLERJ. ZHANG: "End to end learning for self-driving cars", ARXIV PREPRINT ARXIV:1604.07316, 2016
U. MULLERJ. BENE. COSATTOB. FLEPPY. L. CUN: "Off-road obstacle avoidance through end-to-end learning", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2006, pages 739 - 746

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049167A (en) * 2022-08-16 2022-09-13 北京市城市规划设计研究院 Traffic situation prediction method, device, equipment and storage medium
CN115049167B (en) * 2022-08-16 2022-11-08 北京市城市规划设计研究院 Traffic situation prediction method, device, equipment and storage medium

Also Published As

Publication number Publication date
DE112020006532T5 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
Tampuu et al. A survey of end-to-end driving: Architectures and training methods
Kabzan et al. AMZ driverless: The full autonomous racing system
Bansal et al. Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst
Sadat et al. Jointly learnable behavior and trajectory planning for self-driving vehicles
Li et al. ADAPS: Autonomous driving via principled simulations
CN111868641A (en) Method for generating a training data set for training an artificial intelligence module of a vehicle control unit
CN113519018A (en) Mobile body control device and mobile body control method
Pfeiffer et al. Visual attention prediction improves performance of autonomous drone racing agents
Chib et al. Recent advancements in end-to-end autonomous driving using deep learning: A survey
US20230419113A1 (en) Attention-based deep reinforcement learning for autonomous agents
Okamoto et al. Vision-based autonomous path following using a human driver control model with reliable input-feature value estimation
Chen et al. Reactive motion planning with probabilisticsafety guarantees
Heinrich Planning universal on-road driving strategies for automated vehicles
Gómez-Huélamo et al. Train here, drive there: ROS based end-to-end autonomous-driving pipeline validation in CARLA simulator using the NHTSA typology
Păsăreanu et al. Compositional Verification for Autonomous Systems with Deep Learning Components: White Paper
WO2021160273A1 (en) Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment
Kaur et al. Scenario-based simulation of intelligent driving functions using neural networks
Sukthankar et al. Evolving an intelligent vehicle for tactical reasoning in traffic
Navarro et al. Development of an autonomous vehicle control strategy using a single camera and deep neural networks
Takehara et al. Autonomous car parking system using deep reinforcement learning
WO2023187121A1 (en) Simulation-based testing for robotic systems
Weber et al. Approach for improved development of advanced driver assistance systems for future smart mobility concepts
US11960292B2 (en) Method and system for developing autonomous vehicle training simulations
CN114077242A (en) Device and method for controlling a hardware agent in a control situation with a plurality of hardware agents
Baluja et al. Prototyping intelligent vehicle modules using evolutionary algorithms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20705351

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20705351

Country of ref document: EP

Kind code of ref document: A1