WO2021160273A1

WO2021160273A1 - Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment

Info

Publication number: WO2021160273A1
Application number: PCT/EP2020/053817
Authority: WO
Inventors: Muhammad Saad ZIA; Faizan MEHMOOD
Original assignee: Automotive Artificial Intelligence (Aai) Gmbh
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2021-08-19
Also published as: DE112020006532T5

Abstract

The present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling as well as a respective training computing system, and a computing system for simulating a road driving environment for one or more vehicles comprising or consisting of one or more processors using the inventively trained traffic agent.

Description

COMPUTING SYSTEM AND METHOD USING END-TO-END MODELING FORA SIMULATED TRAFFIC AGENT IN A SIMULATION ENVIRONMENT

TECHNICAL FIELD:

PRIOR ART:

Before the driving characteristics of road vehicles are tested in reality, computer simulations of certain driving situations, such as braking, are carried out. As the prediction period is usually only up to 2 seconds, those models cannot predict complex driving situations, such as required during overtaking.

The problem of devising a system that can control a car safely in a variety of traffic situations has been studied extensively and is of obvious interest for autonomous vehicle development. The focus for this area of research is on making safe and efficient decisions under real-time constraints. The simulated safe and efficient decisions, however, may not reflect human driving decisions in natural traffic.

Human driving decisions on a road can essentially be considered to comprise several abstract levels or phases forming a driving stack. Based on a particular road situation, a driver may decide to carry out a particular high-level maneuver, e.g. overtake, formulate a motion plan (also called “trajectory”) accordingly and apply control functions on actuators (throttle, brake, steer) to execute the decision.

Thus, it becomes more and more relevant to simulate human driving decisions in natural traffic. Human driving decisions in natural traffic are, moreover, influenced by many factors and can be considered at various levels. For example, depending on their mental environment, human drivers, being in the same situation, may take different decisions, such as overtake, follow a car in front or change the lane.

Many existing models employ a hierarchical structure in the sense that more abstract decisions (such as, which route to take) are computed first and then passed “down” to different layers that deal with an increasing level of details of the driving process based on that input. The driving stack is split into several phases which aim to reflect actual relevant components to the different approaches, e.g. in the context of simulation environments, rather than the driving stuff of an autonomous vehicle.

Such phases may be considered as follows: Perception/Map generally relates to the input about the environment that is available to other components.

Traffic Rules generally relates to any component that provides legal restrictions to high-level decisions.

Mission Planning generally relates to a strategy on when to be where in the long- term (e.g. lane-level routing).

Traffic-Free Reference Line generally relates to planning an “optimal" reference-line ignoring other traffic participants.

Behavior Planning generally relates to planning a behavior plan, that is when exactly to conduct actions, such as lane changes, incorporating other participants. Decision Post-Processing generally relates to correcting the decisions of the previous components for conforming to basic safety rules, if necessary.

Motion/Trajectory Planning generally relates to planning the exact future trajectory for a short time (up to 2 seconds) horizon.

Command Conversion generally relates to computing the final commands to send to a (real or simulated) vehicle, such as steering instructions. Vehicle Dynamics/Physics generally relates to simulating the cars behavior resulting from the generated commands.

Position Update generally relates to computing the resulting new position of the vehicle in the simulation. Usage of these terms varies drastically in literature.

It can be argued that these hierarchical models have certain limitations such as not being able to make high-level decisions that can be acted upon as “lower" components like a Motion Planner (component that decides on e.g. timings of accelerations and lane- changes) might need to alter or even reject it (see Junqing Wei, Jarrod M. Snider, Tianyu Gu, John Dolan, and Bakhtiar Litkouhi. A behavioral planning framework for autonomous driving. Pages 458-464, 062014). Accordingly, the hierarchical models only offer limited realism in reflecting human driving behavior.

The power of end-to-end (synonym “e2e” or “E2E") learning using neural networks has been proven many times in various domains. In the autonomous driving industry, the e2e approach is popular in constructing robust models to various driving controls, e.g. steering, pedal control etc., in a way that maps sensory input (e.g. image pixels) directly to control output. This direct mapping relieves the need to use comprehensively labeled training data with annotated lane markings, road boundaries etc. and allows salient features to be extracted based on a goal driven learning approach. Bojarsky has shown that the decision-making processes of a human driver during lane following can be modelled in a deep neural network (see M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016). The authors attempt to map raw images from driving footage to steering commands of the car and thereby implicitly embedding the levels in the driving stack in the layers of a neural network, much like a human mind does. The model is taught to learn lane-keeping behavior of human drivers, but lane changing was not modelled. The method uses only steering commands, but no information for controlling the car’s longitudinal movement (i.e. acceleration/deceleration). The model is implemented in an autonomous driving car and not as a simulated traffic agent in a simulation environment. Muller presents the same approach used to train on a remote control car data and consequently automate it’s driving (see U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739-746).

Xu and Gao use end-to-end deep learning to map raw images from numerous on road human driving footages to both high-level actions of “stop" and “go" as-well as steering angle commands (see H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174-2182). The intended behavior of the model can be approximated to cover lane following, obstacle avoidance and lane change behavior of human drivers. The work provides only a distribution of the car controls e.g. steering and therefore does not claim to be highly accurate to drive a car in a simulation or real-world driving scenario. The method does not model acceleration/deceleration commands - only high-level decisions of stop and go. The model is not implemented in a simulated traffic agent in a simulation environment. Codevilla uses the same approach to learn the image to complete longitudinal and latitudinal control commands (steering and acceleration) of a car using the CARLA driving simulation data (see F. Codevilla, M. Miiller, A. Lopez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). 1em plus 0.5em minus 0.4em IEEE, 2018, pp. 1-9). The model is taught to learn roughly all aspects of the driving behavior, i.e. lane following, adaptive cruise-control, obstacle avoidance and lane changing. The model was evaluated in a simulation environment as traffic agent. Chen presents a similar solution in the TORCS racing-car simulation environment approach and also claims that the end-to-end model explicitly learns to focus on interpretable perception items, such as distance to lane and road boundary, distance to other cars around and angular deviation from the road as part of a more interpretable solution to modelling driving behavior (see C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2722- 2730). Both of these solutions, however, are trained on simulation data of a computer- controlled driver thereby not having the complete capacity to exhibit actual human-like behavior. In particular, the relevant prior art work mentioned above either only limit the control of the car to only latitudinal steering commands or target one specific function associated to human driving, e.g. lane follow, lane change etc. This limitation, however, does not allow to exhibit actual human-like driving behavior in simulated traffic agent in a simulation environment.

Furthermore, the prior art solutions are dependent on implicit learning of perception items such as lane-boundary positions, traffic car positions etc. from visual input (image), which leads to less accurate information of the vehicle environment.

In view of the shortcomings of the prior art, it is the aim of the present invention to provide a computing system and method for simulating a road driving environment in a driving situation for one or more vehicles, so that the decision of a traffic agent reflects a human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably steering and acceleration, in a way that exhibits a naturalistic driving behavior in general and in particular for high-level decisions, such as lane changing behavior, e.g., in an overtake.

BRIEF DESCRIPTION OF THE INVENTION:

The aforementioned aim is solved at least in part by means of the claimed inventive subject matter. Advantages (preferred embodiments) are set out in the detailed description hereinafter and/or the accompanying figures as well as in the dependent claims.

Accordingly, a first aspect of the invention relates to a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment. The method comprises or consists of the following steps: a. Providing driving data at one or more time frames t_i = [t₁, t₂, ... t_n] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and providing map data on the respective road at the given time frames t_i, b. Processing at least part of the driving data and map data of step a) into one or more respective perception frames P_i = [p₁,p₂, _■■■ p_n] per given time frames t_i, wherein each perception frame P_i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, c. Processing at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames C_i = [c₁₍ c₂, ... c_n ] per given time frames t_u wherein each vehicle control frame C_i contains longitudinal and latitudinal positions of the respective ego vehicles, d. Training a decision maker computer model of the traffic agent with the one or more perception frames P_i per given time frames t_i of step b) as input to the model and with the one or more ground truth vehicle control frames C_i per given time frames t_i of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉ_i = [c₁, c₂, ... c_n] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the

respective ground truth vehicle control frames C_i, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.

The processing steps b) and c) of the inventive method according to the first aspect can be conducted simultaneously or sequentially in any order.

A second aspect of the invention relates to a computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer-implemented training method according to the first inventive aspect.

A third aspect of the invention relates to a computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent was trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames containing longitudinal and latitudinal positions to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.

The inventive aspects of the present invention as disclosed hereinbefore can comprise any possible (sub-)combination of the preferred inventive embodiments as set out in the dependent claims or as disclosed in the following detailed description and/or in the accompanying figures, provided the resulting combination of features is reasonable to a person skilled in the art.

BRIEF DESCRIPTION OF THE DRAWINGS:

Further characteristics and advantages of the present invention will ensue from the accompanying drawings, wherein Figs. 1a) to 1c) show schematic representations of (parts) of the E2E car control model of the inventive computer systems in training (Figs. 1a) and 1b)) and deployment (Fig. 1c)), respectively.

Fig. 2 shows a schematic representation of a six-vehicle-neighborhood information.

Fig. 3 shows a schematic representation of a semi-circular road geometry and applicable displacement vectors.

Fig. 4 shows a distribution graph of error/frame in Δbearing against DFS ground-truth validation data in a Lane Follow module according to the invention.

Fig. 5 shows a distribution graph of error/frame in Δacceleration against DFS ground-truth validation data in a Lane Follow module according to the invention. Fig. 6 shows a distribution graph of lane-center deviation in the DFS real traffic data in Lane Follow module.

Fig. 7 shows a distribution graph of lane-center deviation of Lane Follow module when run within simulation. Figs. 8a) and 8b) respectively show distribution graphs of relative speed versus relative distance to front car for the model test-run in simulation (Fig. 8a)) and ground- truth DFS data (Fig. 8b)). DETAILED DESCRIPTION OF THE INVENTION:

As set out in more detail hereinafter, the inventors of the different aspects of the present invention have found out that the computer-implemented systems and methods according to the present invention enable a traffic agent navigating a road vehicle in a simulation environment to make simulated driving decisions in high-level (e.g., lane change, overtake driving situations) and low / operational level (trajectory and motion planning), which reflect human like (naturalistic) behavior, i.e. controls the vehicle’s longitudinal and lateral position, preferably bearing and acceleration, in a way that exhibits a naturalistic driving behavior in any driving situation.

Thus, the present invention successfully exhibits the naturalistic decision making behavior from the source data in the simulation environment in terms of planning, safety- procedures and traffic rule compliance.

The respective naturalistic driving and map data is according to the present invention processed to form one or more perception frames per given time frames containing corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. In addition the respective naturalistic driving and map data is according to the present invention processed to form one or more respective vehicle control frames per given time frames, wherein each vehicle control frame contains longitudinal and latitudinal position of the respective ego vehicle. The application of three categories of the perception frame is fundamental in order to provide an effective generalization of the inventive computer model.

According to the present invention, the decision maker computer model of the simulated traffic agent is trained with the respective one or more perception frames as input to the model and with the one or more ground truth vehicle control frames as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames. In other words, when matching the predicted vehicle control frames with the respective ground truth vehicle control frames the predicted vehicle control frames are approximated with the respective ground truth vehicle control frames. The inventive training procedure of the model is based on a data-driven approach, wherein the model is configured to implicitly learn from the ground truth naturalistic data.

In the context of the present invention, the expression “an additionally or alternatively preferred embodiment’ or “an additionally or alternatively further preferred embodiment’ or “an additional or alternative way of configuring this embodiment’ means that the feature or feature combination disclosed in this preferred embodiment can be combined in addition to or alternatively to the features of the inventive subject matter including any preferred embodiment of each of the inventive aspects, provided the resulting feature combination is reasonable to a person skilled in the art.

Further, in the context of the present invention, the expressions “ comprising ” or “ containing ” shall be understood to have a broad meaning similar to the term “ including ” and will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. This definition also applies to variations on the term “ comprising ” such as “ comprise ” and “ comprises ” as well as variations on the term “ containing ” such as “ contain ” and “contains”.

Moreover, in the context of the present invention, the expression “configured’ shall be understood as in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions, it means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by a data processing apparatus, cause the apparatus to perform the operations or actions.

To achieve the inventive subject matter, advantages and objects thereof, the present invention as disclosed in this disclosure is directed to systems and methods that make use of computer hardware and software to train a virtual traffic agent navigating through a simulation environment using reinforcement learning algorithms and techniques. A virtual traffic agent (in the context of the present invention also called “traffic agent”) can for example be a car, truck, bus, bike or motor bike. Once a virtual traffic agent was trained according to the present invention that replicates human driving behavior in particular in complex driving situations of lane change, one or more trained virtual traffic agents may be injected into a simulation environment including complex driving situations. Such an embodiment is preferred, as the trained traffic agents may interact, cooperate with and challenge an autonomous vehicle system controlling an autonomous vehicle under test. Another advantage is, that such an embodiment is suitable to test the limits and weaknesses of the autonomous vehicle system, especially in complex driving situation scenarios that may be attributed to assertive or aggressive driving behaviors.

Thus, the inventive systems and methods furthermore have the technical effect and benefit of providing an improvement to autonomous vehicle computing technology, as the autonomous vehicle is trained in the inventive simulation environment reflecting human-like / naturalistic driving scenarios.

According to the first aspect of the present invention, a computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps: According to step a) the inventive training method provides driving data at one or more time frames t_i = [t₁, t₂, ... t_n] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and provides map data on the respective road at the given time frames t_i wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. The driving data generally represents trajectory data of the respective ego vehicles.

In an additional or alternative preferred embodiment, the driving data in step a) for each of the given road vehicles comprises or consists of one or more status features of the respective ego vehicles per given time frames ti, preferably comprising or consisting of longitudinal velocity, longitudinal acceleration, and position of respective road vehicle in X, Y co-ordinates respectively per given time frames ti. In an additional or alternative preferred embodiment, the map data of step a) contains corresponding road information comprising or consisting of i) lane counts of the respective road and ii) lane position in X, Y co-ordinates optionally X, Y, Z co- ordinates respectively per given time frames ft.

According to step b), the inventive training method processes at least part of the driving data and map data of step a) into one or more respective perception frames P_i = [p₁,p₂, ... p_n] per given time frames t_i, wherein each perception frame P_i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, wherein i is any arbitrary number such that i e [1,2, ...n] and wherein n is the limit on driven frames.

In an additional or alternative preferred embodiment, the traffic situation in step b) comprises or consists of six-vehicle-neighborhood information, wherein each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle. With respect to the six-vehicle-neighborhood, the vehicle roles are defined in accordance with the present invention as follows:

- The car in front of ego vehicle (in the same lane).

- The car following the ego vehicle in the back (in the same lane).

- The two cars in front of the ego vehicle’s center point translated to the two neighboring lanes.

The two cars in the back of the ego vehicle’s center point translated to the two neighboring lanes.

Each of these might or might not exist for any given time/ego vehicle combination and are reflected in the model.

In an additional or alternative preferred embodiment, the self-state information of the respective ego vehicles in step b) comprises or consists of longitudinal velocity, longitudinal acceleration, and its bearing with respect to the road direction ( angular deviation Ad). The term “ bearing ” of an ego vehicle represents in the context of the present invention the orientation of the ego vehicle in relation to the global x- / y- axes. As an example, the angular deviation may be defined as

w here

represent the bearing of the road and the ego vehicle, at any given time frame t_i, respectively, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. In order to increase the accuracy, the bearing of the road may be substituted by the bearing of the lane and may be defined as

where represent the bearing of the lane and the ego vehicle, at any

given time frame t_i.

In an additional or alternative preferred embodiment, the road geometry in step b) comprises or consists of a numerical representation of a respective lane geometry with respect to the ego vehicle, preferably wherein the numerical representation is selected from a circular or a semi-circular geometry.

As an example, the circular or semi-circular numerical representation of the respective lane geometry having two lane boundaries is in the form of a vector of displacements D_j to each of the two lane boundaries, at any given time frame t_i with

wherein each entry D_j is part of a sequence of displacement points to ego vehicle’s position divided on the basis of their relative bearing values to ego position, with intervals of 1 ° or more around the circular or semi-circular region in front and/or back of ego vehicle, and wherein the length n of the displacement vector D_j represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.

In case of semi-circular geometry, the front region covering 180° is represented, whereas the circular geometry represents both the front and the back regions covering 360°.

According to step c), the inventive training method processes at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames C_i = [c₁₍ c₂, ... c_n ] per given time frames t_i, wherein each vehicle control frame C_i contains longitudinal and latitudinal positions of the respective ego vehicles, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.

In an additional or alternative preferred embodiment, the longitudinal and latitudinal positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacelleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame t_i. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined. The processing steps b) and c) can be executed simultaneously or sequentially in any order.

According to step d) the inventive method trains a decision maker computer model of the traffic agent with the one or more perception frames P_i per given time frames t_i of step b) as input to the model and with the one or more ground truth vehicle control frames C_i per given time frames t_i of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉ_i = [c₁₍ c₂, ... c_n] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames with the respective ground truth vehicle control frames C_i, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.

In other words, the perception P_i of the naturalistic data of step b) is used as input to the computer model, and the ground truth naturalistic data of the vehicle control frames C_i in step c) is used as a label for training purposes. In contrast, thereto, during deployment the inventively trained traffic agent in a computer system simulating a driving environment according to the third inventive aspect does not use the naturalistic vehicle control frames of step c) and substitutes naturalistic perception frame of steps b) by simulated perception frames. During deployment, the decision maker of the inventive simulation computer system according to the third inventive aspect predicts as an action one or more vehicle control frames Ĉ_i containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. As already discussed above with respect to the preferred embodiment of step c), the ground truth vehicle control frames C_i and the predicted vehicle control frames Ĉ_i may comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Aacceleration) and bearing (Abearing) values to be applied to the respective ego vehicles at time frame t_i.

In an additional or preferred embodiment, the decision maker computer model of the traffic agent in step d) uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.

In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.

Accordingly, as one embodiment thereof, the inventive training method further comprises processing the driving data of step a) for the respective ego vehicles per given time frames t_i to binary corresponding ground truth situation categories of “Lane

and wherein the decision maker computer model of the traffic agent in step d) comprises i) a Lane Follow neural network, ii) a Lane Change neural network and iii) a Function Classifier neural network, wherein

- the one or more perception frames P_i are respectively used as input to the Lane Follow, the Lane Change and the Function Classifier neural networks,

- the one or more ground truth vehicle control frames C_i are respectively used as labels for independently training the Lane Follow and the Lane Change neural networks by matching the predicted vehicle control frames Ĉ_i with the respective ground truth vehicle control frames C_i, and

- the respectively applied ground truth situation categories

per given time frames t_i are used as labels to independently train the Function Classifier neural network to predict a corresponding situation category of “Lane Follow” by matching

the predicted situation category with the respective ground truth

situation category

wherein i is any arbitrary number such that i ∈ [1,2,. . .n] and wherein n is the limit on driven frames.

In other words, when matching the predicted situation categories with

the respective ground truth situations categories the predicted situations

categories are approximated with the respective ground truth situations

categories In other words, for each time frame t_i and ego vehicle the

inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames.

All features and embodiments disclosed with respect to the first aspect of the present invention are combinable alone or in (sub-)combination with the second aspect or third aspect of the present invention including each of the preferred embodiments thereof, provided the resulting combination of features is reasonable to a person skilled in the art.

According to the second aspect of the invention a computing system for training a traffic agent navigating a road vehicle in a simulation environment is provided comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer- implemented training method according to the first inventive aspect.

According to an additional or alternative preferred embodiment, the training computing system of the second aspect can be configured in such a way that the traffic agent comprises separate modules so that the respective naturalistic driving data and map data can be processed in a suitable way. In particular, the traffic agent may comprise a module A for processing at least part of the naturalistic driving data and map data to generate the respective perception frames P_i = [p₁,p₂, - p_n] Per given time frames t_i, wherein each perception frame P_i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. Specific embodiments thereof are already discussed with respect to the first inventive aspect and also apply to this inventive training computing system of the second inventive aspect. In addition, the traffic agent may comprise a module B for processing at least part of the naturalistic driving data and map data to generate one or more respective ground truth vehicle control frames C_i = [c₁₍ c₂, ... c_n ] per given time frames t_u wherein each vehicle control frame C_i contains longitudinal and latitudinal positions of the respective ego vehicles. In an additional or alternative preferred embodiment, the longitudinal and latitudinal positions of the respective ego vehicles comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration (Δaccelleration) and bearing (Δbearing) values to be applied to the respective ego vehicles at time frame t_i. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.

Furthermore, the traffic agent according to the second inventive aspect may comprise module C, also called E2E Decision Maker (E2EDM) computer model, comprising one or more neural networks with end-to-end modeling. The output of modules A and B are used as input information to train the one or more E2E neural networks of module C.

In an additional or preferred embodiment, the module C of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture. Preferably, the independent neural networks are independently trained.

Accordingly, as one embodiment thereof, module C comprises i) a Lane Follow neural network (module C2) , ii) a Lane Change neural network (module C3) and iii) a Function Classifier neural network (module C1). In this case, the training computing system may also comprise a module D, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames t_i to binary corresponding ground truth situation categories of “Lane Follow”

In other words, for

each time frame t_i and ego vehicle the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n\ and wherein n is the limit on driven frames.

With respect to the training procedure of the first inventive concept, the inventive training computing system of the second inventive aspect is configured in such a way that

- the one or more perception frames P_i are respectively used as input information to the Lane Follow (module C2), the Lane Change (module C3) and the Function Classifier (module C1) neural networks,

- the one or more ground truth vehicle control frames C_i are respectively used as labels for independently training the Lane Follow (module C2) and the Lane Change (module C3) neural networks by matching the predicted vehicle control frames Ĉ_i with the respective ground truth vehicle control frames Ĉ_i, and

- the respective ground truth situations categories

per given time frames t_i are used as labels to independently train the Function Classifier (module C1) neural network to predict corresponding situation categories of “Lane Follow”

^or “Lane Change”

by matching the predicted situation category

with the respective ground truth situation category

The inventive training computing system is furthermore configured in such a way that the output of the Function Classifier (module C1), i.e. the respective situation category of the ego vehicle at time frame t_i initiates either the Lane Follow (module C2) or the Lane Change (module C3) neural network respectively.

An advantage of the inventive computing system for training a traffic agent is that the traffic agent is trained to predict both longitudinal and lateral positions of a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior. All features and embodiments disclosed with respect to the second aspect of the present invention are combinable alone or in (sub-)combination with the first aspect or second aspect of the present invention including each of the preferred embodiments thereof, provided the resulting combination of features is reasonable to a person skilled in the art.

According to the third aspect of the invention a computing system for simulating a road driving environment in driving situations for one or more vehicles is provided comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent is trained according to the computer-implemented training method according to the first inventive aspect to predict as an action one or more vehicle control frames

containing longitudinal and latitudinal positions of a simulated vehicle in the simulation environment per given time frame t_i, wherein i is any arbitrary number such that i ∈ [1,2, ...n] and wherein n is the limit on driven frames. In other words, the traffic agent used in the inventive simulation computer system of the third aspect was trained according to the inventive training method prior to deployment in a simulation environment, wherein the driving environment (simulation) is expected to provide environment data for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed and control commands are generated by the inventively trained E2E decision maker and passed back to the environment for positional update. Such a computing system is also called an integrated system. As already mentioned above, the inventive simulation computer system does not use the naturalistic driving and map data, which is used for the training procedures as input information. Therefore, the inventive simulation computer system does not need to comprise a module B’ corresponding to module B of the training computer system. In contrast, the simulated driving data and map (environment) data of the simulated traffic agent, which may be provided by module ST and/or module S2’ to the perception building module A’, is used as input information in the inventive simulation computer system to generate the respective perception frames P_i per respective time frames t_i in module A’. In other words, module A’ is configured to generate the respective perception frames P_i per respective time frames t_i based on the simulation data provided by module S1’ and/or S2’. The perception frames P_i per respective time frames t_i are used as input information for the inventive E2E decision maker computer model (module C’). Module C’ is configured to predict one or more vehicle control frames Ĉ_i containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing per respective time frames t_ito be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i e [1,2, . .. n] and wherein n is the limit on driven frames.

In an additional or preferred embodiment, the decision maker computer model (module C’) of the traffic agent uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.

Accordingly, as one embodiment thereof, the decision maker computer model (module C’) of the traffic agent comprises i) a Lane Follow neural network (module C2’), ii) a Lane Change neural network (module C3’) and iii) a Function Classifier (module CT) neural network, which are configured in such a way that

- one or more perception frames P_i of the simulated vehicles per given time frame t_i are respectively used as input to the Lane Follow (module C2’), the Lane Change (module C3’) and the Function Classifier (module C3’) neural networks,

- the Function Classifier (module CT) is configured to classify the one or more perception frames P_i of the simulated vehicles per given time frame t_i into the situation category “Lane Follow” or “Lane Change”

Dependent on the respective classification per

given time frame t_i, i.e. either class “Lane Follow” or class “Lane Change”, the Function Classifier (module CT) initiates the neural network of either Lane Follow (module C2’) or Lane Change (module C3’) respectively. As an example, in case the Function Classifier (module C1 ’) classifies a perception frame P₁ at time frame t₁ with the situation category “Lane Follow”

then Function Classifier (module C1’) is configured to initiate the neural network “Lane Follow” to predict the vehicle control frame Ĉ₁ containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t₁. Alternatively, in case the Function Classifier (module C1’) classifies a perception frame P₂ at time frame t₂ with the situation category “Lane Change”

then Function Classifier (module C1’) is configured to initiate the neural network “Lane Change” to predict the vehicle control frame Ĉ₂ containing longitudinal and latitudinal positions to be applied to the simulated vehicle, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing at time frame t₂. The output of the module C’ is provided to the simulated driving environment module (module S2’) in order to be applied to the simulated traffic agent in the simulation environment. Module S2’ is configured to provide module ST with the respective changed simulated environment data comprising driving data and map data of the simulated traffic agent, so that module ST provides module A’ with a changed environment data set in order to generate the next perception frame.

The present invention is described in the following on the basis of exemplary embodiments, which merely serve as examples and which shall not limit the scope of the present protective right.

DETAILED DESCRIPTION OF FIGURES Further characteristics and advantages of the present invention will ensue from the following description of example embodiments of the inventive aspects with reference to the accompanying figures.

All of the features disclosed hereinafter with respect to the example embodiments and / or the accompanying figures can alone or in any sub-combination be combined with features of the two aspects of the present invention including features of preferred embodiments thereof, provided the resulting feature combination is reasonable to a person skilled in the art.

Figure 1a) shows schematic representation of the traffic agent for decision making in simulated driving situations (also called Έ2E car control model”) 1, stored in the memory device, is configured to comprise one or more neural networks with end-to-end modeling and to execute the inventive computer-implemented training method. The inventive computing system for training a traffic agent navigating a road vehicle in a simulation environment also comprises or consists of one or more processors, a memory device coupled to the one or more processors, which are not separately shown in Figure 1a).

According to Figure 1a), the naturalistic driving data and map data (not separately shown) are used as input information for module A (Perception building) and module B (vehicle control building), which are shown in Figure 1a) as combined module 11. Modules A and B may alternatively be present as separate modules. The output information respectively generated by modules A and B in module 11 is used as input information to train the traffic agent decision maker 12 (also called Έ2E decision maker” or module C) in accordance with the inventive training method described in detail in hereinbefore.

As an example, the inventive traffic agent 1 comprises a combined module 11 comprising module A and module B. Module A is configured to process at least part of the naturalistic driving data and map data to generate the respective perception frames Per given time frames t_i, uherein each perception frame P_i contains

corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry. Specific example embodiments thereof are already discussed with respect to the first inventive aspect and also apply to this inventive training computing system of the second inventive aspect. Module B is configured to process at least part of the naturalistic driving data and map data to generate one or more respective ground truth vehicle control frames C_i = [c₁₍ c₂, ... c_n ] per given time frames t_i, wherein each ground truth vehicle control frame C_i contains longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration (Δacceleration) and bearing (Δbearing) values to be applied to the respective ego vehicles per given time frames t_i. The use of changes in values of acceleration and bearing is preferred, as from these values the changes of steering, depth of gas pedal and depth of brake pedal can be directly determined.

In addition, the combined module 11 may also comprise an additional module D (not shown in Figure 1b)), which is configured to classify at least part of the perception frames P_i based on the naturalistic driving data and map data into a binary situation category of either “Lane Follow” or “Lane Change” per given time frames t_i.

According to an alternative or an additional preferred embodiment, the module 12 (module C) of the traffic agent 1 uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture. Preferably, the independent neural networks are trained independently. One example of such a configuration, wherein the E2E decision maker 12 comprises three neural networks 121 (Function Classifier), 122 (Lane Follow) and 123 (Lane Change) combined in a branched architecture, is shown in Figure 1b). As input information, the output data of module 11 is used. In an additional or alternative preferred embodiment, at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.

Figure 1b) shows as one example thereof, that the E2E decision maker 12 comprises i) a Lane Follow neural network 122 (module C2) , ii) a Lane Change neural network 123 (module C3) and iii) a Function Classifier neural network 121 (module C1). In this case, the traffic agent 1 comprises the module D (not shown in Figure 1b)) for classifying situation categories, which is configured to process the naturalistic driving data and map data for the respective ego vehicles per given time frames t_i to binary corresponding ground truth situation categories of “Lane Follow”

other words, for each time frame t_i and ego vehicle the inventive training computing system is configured to determine, whether the respective ego vehicle follows the lane or whether it changes the lane, wherein i is any arbitrary number such that i ∈ [1,2, ...n\ and wherein n is the limit on driven frames. According to this example, the E2E decision maker 12 of the inventive traffic agent 1 is configured in such a way that

- the one or more perception frames P_i are respectively used as input information to the Lane Follow 122 (module C2), the Lane Change 123 (module C3) and the Function Classifier 121 (module C1) neural networks,

- the one or more ground truth vehicle control frames C_i are respectively used as labels for independently training the Lane Follow 122 (module C2) and the Lane Change 123 (module C3) neural networks by matching the predicted vehicle control frames Ĉ_i with the respective ground truth vehicle control frames C_£, and

- the respective ground truth situations categories per given time

frames t_i are used as labels to independently train the Function Classifier 121 (module C1 ) neural network to predict a corresponding situation category of

by matching the predicted situation category

with the respective ground truth situation category

The E2E decision maker 12 is furthermore configured in such a way that the output of the Function Classifier 121 (module C1), i.e. the respective situation category or

of the ego vehicle at time frame t_i, initiates either the Lane Follow 122 (module C2) or the Lane Change 123 (module C3) neural network respectively.

An advantage of the inventive computing system for training is that the traffic agent 1 is trained to predict both longitudinal and latitudinal positions, preferably changes of longitudinal and latitudinal positions to be applied to a vehicle in a simulated environment, wherein the prediction reflects naturalistic driving behavior. According to one example, the changes of longitudinal and latitudinal positions may be in the form of changes of acceleration and changes of bearing to be applied to the simulated vehicle at a given time frame.

Figure 1 c) shows a schematic representation of an inventive integrated simulation computer system 0T deploying an inventively trained traffic agent T comprising a module 11 ’ (module A’) for perception building based on the simulated environment data provided by module 2T (module ST) and a E2E decision maker model 12’ as well as one or more processors, a memory device coupled to the one or more processors (not separately shown in Figure 1c)). The driving environment module S2’ (simulation) is expected to provide environment data in module ST for an ego vehicle containing (i) map-information, (ii) traffic-information and (iii) traffic rules. This data is then processed to build perceptions in module 11' and control commands are generated by the E2E decision making 12’ and passed back to the environment 22’ for positional update.

Module 11' is configured to generate perception frames for the respective simulated vehicle per given time frame containing information on (i) traffic situation, (ii) self-state information of the simulated vehicle and (iii) road geometry and to provide the generated perception frames as input information to the E2E decision maker module 12’ (module C’). The E2E decision maker module 12’ was trained in accordance with the inventive training method. The E2E decision maker module 12’ is, thus, configured to predict as an action one or more vehicle control frames

containing longitudinal and latitudinal positions, more preferably changes of longitudinal and latitudinal positions, e.g. changes of acceleration and bearing to be applied to the simulated vehicle in the simulation environment.

As already mentioned above, the inventive simulation computer system 0T deploying the inventive traffic agent T does not use the naturalistic driving and map data, which is used for the training procedure as input information. Therefore, the inventive simulation computer system 0T does not need to comprise a module B’ corresponding to module B of the training computer system. In contrast, the simulated driving data and map (environment) data of the simulated traffic agent T, which are provided by module 2T (module ST) to module 11' (module A’) are used as input information in the inventive simulation computer system 0T' to generate the respective perception frames P_i per respective time frames t_i in module 11' (module A’). In other words, module 11' (module A’) is configured to generate the respective perception frames P_i per respective time frames t_i based on the simulation data provided by module 2T (module ST). The perception frames P_i per respective time frames t_i generated by module 11 ’ are used as input information for the inventive E2E decision maker computer model 12’ (module C’). Module 12’ (module C’) is configured to predict the longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated traffic agent per respective time frames t_i. In an additional or preferred embodiment, the decision maker computer model 12’ (module C’) of the deployed traffic agent T uses three or more neural networks, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.

An example configuration of the E2E decision maker 12’ in deployment comprises the analogous configuration set up of the E2E decision maker 12 as shown in Figure 1 b). Accordingly, the respective details and preferred embodiments as discussed hereinbefore also apply.

Thus, the decision maker computer model 12’ (module C’) of the deployed traffic agent T comprises i) a Lane Follow neural network 122’ (module C2’), ii) a Lane Change neural network 123’ (module C3’) and iii) a Function Classifier 12T (module CT) neural network, which are configured in such a way that

- one or more perception frames P_i of the simulated vehicles per given time frame t_i are respectively used as input to the Lane Follow 122’ (module C2’), the Lane Change 123’ (module C3’) and the Function Classifier 12T (module C3’) neural networks,

- the Function Classifier 12T (module CT) neural network is configured to classify the one or more perception frames P_i of the simulated vehicles per given time frame t_i into the situation category “Lane Follow” ^or

“Lane Change” Dependent on the respective

classification per given time frame t_i, i.e. either class “Lane Follow” or class “Lane Change”, the Function Classifier 12T (module CT) initiates the neural network of either Lane Follow 122’ (module C2’) or Lane Change 123’ (module C3’) respectively. As an example, in case the Function Classifier 12T (module CT) classifies a perception frame P₁ at time frame t₁ with the situation category “Lane Follow”

then Function Classifier 121’ (module C1’) is configured to initiate the neural network “Lane Follow” 122’ to predict the vehicle control frame Ĉ₁ containing longitudinal and latitudinal positions, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time frame t_1. Alternatively, in case the Function Classifier 121’ (module C1’) classifies a perception frame P₂ at time frame t₂ with the situation category “Lane Change” then Function Classifier 121’ (module C1’) is configured to initiate the neural

network “Lane Change” 123’ to predict the vehicle control frame C₂ the changes in longitudinal and latitudinal position, preferably the changes in longitudinal and latitudinal position, more preferably the changes in acceleration and bearing to be applied to the simulated vehicle at time frame t_2.

The output information of module 12’ (module C’) is provided to the simulated driving environment 22’ (module S2’) in order to be applied to the simulated traffic agent T in the simulation environment. Module 22’ (module S2’) is configured to provide module 2T (module ST) with the respectively changed simulated environment data comprising driving data and map data of the simulated traffic agent T, so that module 2T (module ST) provides module 11' (module A’) with the changed environment data in order to generate the next perception frame.

Experimental Part

For training purposes in accordance with the present invention, the inventors used commercial driving data DataFromSky (DFS, purchased from RCE systems s.r.o., Czech Republic) comprising driving data of vehicles driven by humans for a duration of six hours on part (500 m) of the highway A9 in Germany. The DFS data set in particular comprised the following features: timestamp (in seconds, s), longitudinal velocity (in meter/seconds, m/s), longitudinal acceleration (in meter/square seconds, m/s²), and global coordinates of respective vehicle (traffic agents) (in x-, y- co-ordinates).

Furthermore, the The OpenDRIVE digital map (downloaded from http://www.opendrive.0rg/) was used as map data in the simulation to generate lane points in reference to each ego position, which were used to construct road-geometry data for the model to be used as input. These lane points can be described as:

for current and two adjacent lanes of a subject/ego vehicle at a time interval t_i where each lane as set of coordinates X such that x

_n is the

last point on the lane that is at a maximum distance of 400 m to ego/subject vehicle

position at t_i.

As already set out in the detailed description above, the perception frame used with respect to the present invention is divided into three categories:

1. Traffic Situation: input (DFS and OpenDRIVE) data is processed to form the six- vehicle-neighbourhood information with reference to each ego/subject vehicle, where each represented vehicle in the six positions offer two pieces of information: (i) relative distance d to ego vehicle and (ii) relative speed v_r to ego vehicle speed v_e. Figure 2 shows a schematic representation of a six-vehicle-neighborhood information at a specific time frame. As set out above, the vehicle roles are defined in a six-vehicle neighborhood according to the present invention as follows:

The car 311 in front of ego vehicle 3 (in the same lane).

The car 312 following the ego vehicle 3 in the back (in the same lane).

The two cars 321 , 322 in front of the ego vehicle’s 3 center point translated to the two neighboring lanes.

The two cars 331 , 332 in the back of the ego vehicle’s 3 center point translated to the two neighboring lanes.

According to Figure 2, all of the positions exist at the represented time frame. All of the six vehicles 311 , 312, 321 , 322, 331 and 332 in the neighborhood of the ego vehicle 3 have the same distance d to ego vehicle 3. The relative speed v_r is the respective speed v_n of any of the six neighborhood vehicles 311, 312, 321, 322, 331 and 332 minus the speed v_e of the ego vehicle 3 (assuming the vehicles are moving in the same direction). 2. Ego-state information: includes longitudinal velocity, longitudinal acceleration, angular deviation and bearing of the ego vehicle with respect to the lane direction. As mentioned above, Angular deviation (Ad) is defined as:

where

are the global bearing/orientation of the lane and the ego vehicle, at any given time instance t_i, respectively.

3. Road Geometry: According to an the present example experiment, DFS driving data and OpenDRIVE map data is processed to yield a semi-circular numerical representation of road geometry in the form of a vector of displacements, D_j, to each of the two lane boundaries LB 1 and LB 2, at any time instance t_i such that:

where each entry D_j is part of a sequence of displacement points to ego position divided on the basis of their relative bearing values to ego position, with intervals of 5° around the semi-circular region in front of ego. Therefore the length n of the displacement vector

in the experimentation scope of this paper has been set to: n = 180/5 = 36

Such a semi-circular road geometry is schematically represented in Figure 3, wherein - for the sake of clarity - only part of the 36 displacement vectors D_j of the ego vehicle 3 at this instance are illustrated.

The present inventors investigate two possible approaches to model the inventive E2E decision maker using neural networks using simulation environment:

• Single network model, which involves the decision making processes to be learned in a single neural network with a sequence of n layers, where

empirically deduced during experimentation.

• Functionally branched networks, which involve the decision making processes to be divided on the basis of fundamental driving functions e.g. lane following and lane changing. Three neural networks are used to model this approach, each of which can be defined to comprise a sequence of n layers, where

, empirically deduced during experimentation. These are as follows:

- Lane follower module 122, which is used to control the vehicle during general lane follow scenarios, - Lane changer module 123, which is used to control the vehicle during general lane follow scenarios,

- Function classifier module 123, which is used to classify, in binary, whether a situation is that of lane follow or a lane change, and thus triggers one of the two corresponding models 122 or 123. Each of these sub-modules were trained independently.

In the present experimentation, the Lane Following module 122 targets two abstract sets of scenarios:

• Adaptive cruise control (ACC): controlling the vehicle’s throttle/acceleration with reference to the front car. · Traffic-free steer control: controlling the vehicle’s steering to keep the lane.

A branched neural network architecture split into two completely separate networks with no common set of layers for both of Δacceleration and Δbearing was used for trained with DFS driving data and OpenDRIVE map data. The network was optimized using the following loss function:

where is any arbitrary number of data samples, y_acc and are ground

truth label and predicted values of Δacceleration and y_bear and

are ground truth label and predicted values of Δbearing. Specific mechanism was need to cater the problem of cascading error during test-tuns of the model in the simulation, which meant that minute errors in each frame added up to yield states which were rarely seen in the lane following training data, resulting in the model failing to control the steering well enough to keep the lane, eventually leading the vehicle out of the lane. The corrective mechanism involved filtering the training data to increase the involvement, within each training iteration, of those situations where the vehicle was displaced on either side of the lane center and where Δbearing was such that the distance to lane center was being reduced.

The experimentation results for the Lane Follow module 122 independently trained on real traffic data DFS are shown in Figures 4 to 7. The respective graphs in Figures 4 and 5 show the error per frame of the Lane Follow module 122 with respect to the ground- truth dataset for Δbearing and Δacceleration values respectively.

The graph in Figure 6 shows the distribution of average lane-center deviation of vehicles in the ground-truth dataset while the graph in Figure 7 shows the same distribution for the Lane Follow model when run in the simulation environment. The distribution of lane-center deviation appears to be higher in the DFS data, potentially as a result of the positional errors in recording of the data which has been declared to be upto «0.5 m. The Lane Follow module 122, however, shows a relatively less deviation owing to the corrective mechanism during training.

The Lane Changer module 123 targeted specific situations where the vehicle was expected to transition into either of the two adjecent lanes. The model was expected to learn to predict Δbearing and Δacceleration values in a way that the higher-level decision of direction of lane change is implicitly taken at each frame wrapped into the lower-level output of Δbearing and Δacceleration values. The long-term affect of this is the smooth transition to the implicitly decided lane. This process is catered for within the neural network itself and hence the term “implicit” decision is used.

The same perception vector is used as input to the model, except that the road- geometry displacement vector is calculated with reference to the center-points of the adjacent lanes, not the current lanes. The network in this case, is the same branched architecture described in the Lane Follow model, and was optimized using the following loss function:

where k e Ί⁺ is any arbitrary number of data samples, y_acc and y_acc are ground truth label and predicted values of Δacceleration and y_bear and y_bear are ground truth label and predicted values of Δbearing. It was found through experiment that absolute mean squared error as in equation 2 was not able to converge well owing to the small values of ground truth Δbearing in the data related to lane change scenarios and therefore mean absolute error was preferred, as shown in equation 3.

Table 1 below shows the distribution of lane change directions in the ground-truth data and that shown by the Lane Change 123 model during a test run in the simulation. It can be seen that the percentages of corresponding lane changes are very similar for both sources, which can be seen as rough approximate of the similarity in behavior of the model with real human drivers in the data.

Table 1 : Percentage of lane change directions selected by the lane change module and real traffic data.

The Function Classifier 121 is targeted to act as a moderator for the two main modules: Lane Follow 122 and Lane Changer 123. A subset of the perception is used as input to this model in the form of the (i) traffic situation and (ii) ego-state information to predict the single value likelihood of the scenario

being a lane change P(X = LaneChange ) or the scenario being a lane follow 1 - P(X = LaneChange ) enhanced

by the direction of lane change. A single fully connected neural network was used to train the model with the DFS data, optimized on the following cost function known as log loss:

where

is any arbitrary number of data samples, and y and

are the ground truth and predicted output of the model as likelihood of the scenario being a Lane Change (and correspondingly the likelihood of Lane Follow as

.

Table 2 below shows the confusion matrix for the classification of lane follow and lane change scenarios by the Functional Classifier model compared to the ground-truth real traffic data. The numbers shown in the table represent the respecitvely recorded instances (total instances: 148,501 ) in the ground truth data. For the ground truth labe “Lane Follow”, the inventive Function Classifier 121 model predicts in 96,578 instances correctly the situation category “Lane Follow” and only in 8,592 instances incorrectly the situation category “Lane Change”

For the ground truth label “Lane Change”, the Function Classifier 121 model predicts in 35,778 instances correctly the situation category “Lane Change”

_C and only in 7,553 instances incorrectly the situation category “Lane Follow” Thus, the Function Classifier 121 model exhibits a precision

value of 0.92, a recall value of 0.93 and therefore an appreciable F1 -score of 0.925.

Table 2: Confusion Matrix for function classification compared to ground-truth data

It was found that the general behavior of the E2E decision maker model, when run in the simulation environment, exhibited similarity with that of the human drivers found in the real traffic data of DFS. The graphs in Figures 8a) and 8b) show the behavioral trend in maintaining the speed and distance to the front traffic vehicle during lane following scenarios for both the model test-run in simulation and ground-truth data respectively. The behavioral trend of the data provided by the traffic agent trained according to the present invention is similar to the behavioral trend of the naturalistic DFS data.

The final E2E decision maker module, trained on real traffic data (DFS), was also evaluated for safety compliance in terms of collisions with surrounding traffic vehicles. The E2E decision maker module was tested in the simulation environment and presently resulted only in 4 minor collisions (front car collision at low speeds) in a 30 minute drive, with dense traffic surroundings.

Claims

Claims:

1. A computer-implemented method for training a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps: a. Providing driving data at one or more time frames t_i = [t₁, t₂, ... t_n] for one or more road vehicles as ego vehicles respectively driven by a human in a realistic situation on a road and providing map data on the respective road at the given time frames t_i, b. Processing at least part of the driving data and map data of step a) into one or more respective perception frames P_i = [p₁,p₂, _■■■ p_n] per given time frames t_i, wherein each perception frame P_i contains corresponding perception information for (i) traffic situation, (ii) self-state information of the ego vehicle and (iii) road geometry, c. Processing at least part of the driving data and map data of step a) into one or more respective ground truth vehicle control frames C_i = [c₁, c₂, ... c_n] per given time frames t_i, wherein each vehicle control frame C_i contains longitudinal and latitudinal positions of the respective ego vehicles, d. Training a decision maker computer model of the traffic agent with the one or more perception frames P_i per given time frames t_i of step b) as input to the model and with the one or more ground truth vehicle control frames C_i per given time frames t_i of step c) as labels for the training of the model, wherein the decision maker uses one or more neural networks with end-to-end modeling and is configured to predict corresponding vehicle control frames Ĉ_i = [c₁₍ c₂, ... c_n] containing longitudinal and latitudinal positions of the respective ego vehicle by matching the predicted vehicle control frames

with the respective ground truth vehicle control frames C_i, wherein i is any arbitrary number such that i ∈ [1,2, . . . n] and wherein n is the limit on driven frames.

2. Training method according to any claim 1, further comprising processing the driving data of step a) for the respective ego vehicles per given time frames t_i to binary corresponding ground truth situation categories of “Lane Follow”

and wherein the decision maker computer model of the traffic agent in step d) comprises i) a Lane Follow neural network, ii) a Lane Change neural network and iii) a Function Classifier neural network, wherein - the one or more perception frames P_i are respectively used as input to the Lane Follow, the Lane Change and the Function Classifier neural networks,

- the respectively applied ground truth situation categories

per given time frames t_i are used as labels to independently train the Function Classifier neural network to predict a corresponding situation category of “Lane Follow” or “Lane Change”

by matching the predicted situation

category S

with the respective ground truth situation category , wherein i is any arbitrary number such that i ∈ [1,2, ...n]

and wherein n is the limit on driven frames.

3. T raining method according claim 1 or 2, wherein the driving data in step a) for each of the given road vehicles comprises or consists of one or more status features of the respective ego vehicles per given time frames ti, preferably comprising or consisting of longitudinal velocity, longitudinal acceleration, and position of respective road vehicle in X, Y co-ordinates respectively per given time frames ti.

4. Training method according to any one of claims 1 to 3, wherein the map data of step a) contains corresponding road information comprising or consisting of i) lane counts of the respective road and ii) lane position in X, Y co-ordinates respectively per given time frames ti.

5. Training method according to any one of claims 1 to 4, wherein the traffic situation in step b) comprises or consists of six-vehicle-neighborhood information, wherein each represented vehicle of the six positions comprises or consists of i) relative distance of respective vehicle to ego vehicle and ii) relative speed of respective vehicle to speed of ego vehicle.

6. Training method according to any one of claims 1 to 5, wherein the self-state information of the respective ego vehicles in step b) comprises or consists of longitudinal velocity, longitudinal acceleration, and its bearing with respect to the road direction (angular deviation Ad).

7. Training method according to claim 6, wherein the angular deviation Ad is defined as

where

represent the bearing of the road and the ego vehicle, at any given time frame t_i, respectively, wherein i is any arbitrary number such that i ∈ [1,2, .. . n] and wherein n is the limit on driven frames.

8. Training method according to any one of claims 1 to 7, wherein the road geometry in step b) comprises or consists of a numerical representation of a respective lane geometry with respect to the ego vehicle, preferably wherein the numerical representation is selected from a circular or a semi-circular geometry.

9. Training method according to claim 8, wherein the circular or semi-circular numerical representation of the respective lane geometry having two lane boundaries is in the form of a vector of displacements D_j to each of the two lane boundaries, at any given time frame t_i with

wherein each entry D_j is part of a sequence of displacement points to ego vehicle position divided on the basis of their relative bearing values to ego position, with intervals of 1 ° or more around the circular or semi-circular region in front and/or back of ego vehicle, and wherein the length n of the displacement vector D_j represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.

10. Training method according to any one of claims 1 to 9, wherein the longitudinal and latitudinal positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably comprise or consist of changes of acceleration and bearing values to be applied to the respective ego vehicles at time frame t_i.

11. Computing system for training a traffic agent navigating a road vehicle in a simulation environment comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors characterized in that the traffic agent is configured to execute the computer-implemented training method according to any one of claims 1 to 10.

12. Computing system for simulating a road driving environment in driving situations for one or more vehicles comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent using one or more neural networks for decision making in simulated driving situations using one or more neural networks with end-to-end modeling stored in the memory device and configured to be executed by the one or more processors, characterized in that the traffic agent was trained according to the computer- implemented training method according to any one of claims 1 to 10 to predict as an action one or more vehicle control frames Ĉ_i containing longitudinal and latitudinal positions to be applied to a simulated vehicle in the simulation environment, wherein i is any arbitrary number such that i ∈ [1,2, ... n] and wherein n is the limit on driven frames.

13. Computing system for training a traffic agent according to claim 10 or 11 or for simulating a road driving environment in a driving situation according to claim 12, wherein three or more neural networks are used, preferably, wherein at least part or all of the neural networks are combined in a branched architecture.

14. Computing system according to claim 13, wherein at least part or all of the neural networks are deep neural networks, preferably, wherein at least part or all deep neural networks independently from each other comprise or consist of one, two or more layers, wherein each layer exhibits independently from each other a number of neurons in the range of 1 to 512, more preferably wherein the number of neurons differs per layer in the deep neural network.