WO2021083785A1

WO2021083785A1 - Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle

Info

Publication number: WO2021083785A1
Application number: PCT/EP2020/079764
Authority: WO
Inventors: Ulrich Eberle; Christoph THIEM
Original assignee: Psa Automobiles Sa
Priority date: 2019-10-31
Filing date: 2020-10-22
Publication date: 2021-05-06
Also published as: EP4052178A1; CN114667545A; DE102019216836A1

Abstract

The invention relates to a method for training at least one algorithm for a control device of a motor vehicle, said algorithm being trained by means of a self-learning neural network. The method has the following steps: a) providing a computer program product module for an automated or autonomous driving function, b) providing a simulation environment with simulation parameters, wherein the simulation environment contains map data of an actual existing area of operation, the motor vehicle, and at least one additional simulated traffic participant, and the behavior of the motor vehicle and the at least one additional traffic participant is determined by a rule set with behavior parameters, c) providing a mission for the motor vehicle, d) modifying at least one behavior parameter of the motor vehicle so that the at least one behavior parameter lies within permissible limits at all times, and e) carrying out a simulation of the mission.

Description

METHOD OF TRAINING AT LEAST ONE ALGORITHM FOR A CONTROL UNIT OF A MOTOR VEHICLE, COMPUTER PROGRAM PRODUCT, AND MOTOR VEHICLE

A method for training at least one algorithm for a control unit of a motor vehicle, a computer program product and a motor vehicle are described here.

Methods, computer program products and motor vehicles of the type mentioned at the beginning are known in the prior art. The first partially automated vehicles (corresponds to SAE Level 2 in accordance with SAE J3016) have reached series production readiness in recent years. Automated (corresponds to SAE level> = 3 in accordance with SAE J3016) or autonomously (corresponds to SAE level 4/5 in accordance with SAE J3016) motor vehicles must be able to operate independently with maximum safety in unfamiliar traffic situations based on a variety of specifications, for example destination and compliance with current traffic rules can react. Since the reality of traffic is highly complex due to the unpredictability of the behavior of other road users, especially human road users, it is almost impossible to program corresponding control units of motor vehicles with conventional methods and on the basis of man-made rules.

In order to cope with complex problems by means of computers, it is also known to develop algorithms with methods of machine learning or artificial intelligence or to allow them to be developed by self-learning neural networks. On the one hand, such algorithms can react more moderately to complex traffic situations than traditional algorithms. On the other hand, with the help of artificial intelligence it is in principle possible to further develop and continuously improve the algorithms during the development process and in everyday life through constant learning. Alternatively, the state of the algorithm can be frozen after the end of a training phase in the development process and a validation by the manufacturer.

From DE 102017 007 136 A1 a method for training self-learning algorithms for an automated drivable vehicle is known with a predetermined automation module by generating learning situations, the learning situations being generated as follows: Carrying out a traffic simulation in which a virtual Ego- Vehicle with the automation module of the real vehicle is set in a virtual scenario, the scenario comprising a route structure with a specified route, furthermore, further virtual moving objects generated comprehensively in an automated manner with individually specifiable object properties and behavior models, with the objects independently and with each other in the course of the ongoing simulation interact adaptively on the basis of the respective object properties and behavior models, - Carrying out a driving dynamics simulation on the basis of the automation module as well as virtual sensor signals of the moving objects of a virtual sensor system assigned to the ego vehicle, which corresponds to a sensor system of the real vehicle in which Reactions of the ego vehicle are generated, - Identification of a relevant learning situation using selection criteria that are determined on the basis of predeterminable metrics.

The disadvantage here is that the simulation is carried out with road users who conform to the rules. In reality, however, it often happens that road users do not comply with the rules, e.g. drive too fast, seemingly run over lane markings for no reason, are inattentive, overtake on the right, etc. An algorithm that is only trained with other road users who comply with the rules is therefore less sensitive to human driving behavior prepared. This leads to an unnatural driving behavior of a motor vehicle equipped with a suitably trained algorithm, since the motor vehicle reacts less flexibly.

Furthermore, there are situations in which the flow of traffic can be improved and possibly even the risk of an accident can be reduced if the ego vehicle does not behave one hundred percent according to the rules, e.g. if it drives over a solid line to avoid an obstacle for as long this can be done safely, e.g. if there is no oncoming traffic. Braking could have the consequence that unprepared human drivers who were driving afterwards could have a rear-end collision due to the sudden interruption of the flow of traffic.

Thus, the task arises of developing methods, computer program products and motor vehicles of the type mentioned at the outset such that a trained algorithm can be better adapted to real traffic situations. The object is achieved by a method for training at least one algorithm for a control unit of a motor vehicle according to claim 1, a computer program product according to the independent claim 12 and a motor vehicle according to the independent claim 13. Further refinements and developments are the subject of the dependent claims .

A method for training at least one algorithm for a control unit of a motor vehicle is described below, the control unit being provided for implementing an automated or autonomous driving function with intervention in units of the motor vehicle on the basis of input data using the at least one algorithm, with the algorithm is trained by a self-learning neural network, comprising the following steps: a) providing a computer program product module for the automated or autonomous driving function, the computer program product module containing the algorithm to be trained and the self-learning neural network, b) providing a simulation environment with simulation parameters , where the simulation environment contains map data of a real existing area of use, the motor vehicle and at least one other simulated road user, with a behavior of the motor vehicle as well as the few At least one other road user is determined by a rule set, the rule set containing behavior parameters determining permissible limits, c) providing a mission for the motor vehicle, d) modifying at least one behavior parameter of the motor vehicle so that the at least one behavior parameter is beyond the permissible limits, e ) Perform a simulation of the mission.

Corresponding behavior parameters can be, for example, the permissible speed, a distance to be observed, thresholds for exceeding the prohibition (e.g. a period within which a traffic light that changes to red can be passed, risk parameters at which a solid line can be passed, and / or when despite no right of way) may continue to be driven), a permitted variance of the position of the motor vehicle in the lane, a permitted overtaking side (in right-hand traffic only on the left or on both sides) and the like.

A corresponding mission can be, for example, to get from a starting point in the shortest possible time or in an energy-efficient manner to a specified destination.

It was found that a suitably trained algorithm has a different driving behavior than a conventionally trained algorithm, even within narrower parameter limits, such as would then be used for use in a real motor vehicle. The driving behavior of such an algorithm is more natural, so it corresponds more closely to the driving behavior of a person, which on the one hand benefits the occupants and on the other hand has a more natural effect on other road users. An example of this is the passing of a delivery vehicle parked in the second row with the necessary crossing of a solid line. An algorithm that was absolutely compliant with the rules would stop the motor vehicle and wait for the delivery vehicle to continue. An algorithm trained in accordance with the method described here, which may exceed applicable rules within narrow limits, e.g. if this is possible without risk due to the lack of oncoming traffic, continues at such a point.

It can thus be achieved that a motor vehicle equipped with a corresponding algorithm can react more flexibly to traffic situations than an algorithm trained with conventional methods.

In a first further refinement, it can be provided that the neural network learns through reinforcement learning processes (also known as the RFL algorithm. RFL stands for: "reinforcement learning"), with at least one of the time to fulfill the mission and / or number of accidents in which the motor vehicle is involved during the mission, the simulation being repeated until a minimum metric is reached.

In particular, it can be provided that in order to successfully complete a mission it is necessary not to cause an accident. A further metric can be that it is not an indirect trigger of accidents with other road users, e.g. through sudden, unexpected hard braking. By using reinforcing learning methods, the neural network learns better and better strategies for completing a given mission in successive simulations.

In a further refinement, it can be provided that the at least one abnormal road user is a motor vehicle, motorcyclist or pedestrian.

The vehicles are driven by people, some of whom do not comply with the rules. A simulation with such abnormally behaving road users is therefore particularly realistic.

In a further refinement, it can be provided that the computer program product module has an algorithm that has already been trained with road users that conform to the rules.

In this way, the already learned behaviors are refined and the training becomes more efficient and faster.

In a further refinement, it can be provided that the at least one behavior parameter is exceeded or fallen short of by a predetermined percentage.

This can be particularly useful for behavior parameters that can be expressed in numbers, for example a speed, a distance, a deviation from a specified driving line, etc.

In a further refinement, it can be provided that the simulation is repeated several times, with at least one simulation parameter being changed in each case.

Such a simulation parameter can be a behavior parameter, for example. By varying the corresponding parameters, an over-specialization of the algorithm for a certain situation can be avoided. In a further refinement, it can be provided that the simulation environment is varied.

This can also prevent the algorithm from being trained too much on the existing simulation environment. The variation can take place, for example, through modifications within the same traffic area (e.g. by changing road widths, right of way, traffic lights, road blocks, etc.) or by changing the traffic area as a whole.

In a further refinement, it can be provided that at least one behavior parameter is varied.

Such behavior parameters cover the usual driving behavior of different types of drivers, for example drivers who tend to drive too fast, drivers whose driving precision is less, etc.

In a further refinement, provision can be made for the number, positioning and / or missions of the other road users to be varied.

This creates new situations that can be used to further train the algorithm.

In a further refinement, it can be provided that the algorithm is further trained by a self-learning neural network, comprising the following steps: a) Provision of the computer program product module for the automated or autonomous driving function, the computer program product module containing the algorithm to be trained and the self-learning neural network Network contains, b) providing a simulation environment with simulation parameters, wherein the simulation environment contains map data of a real existing application area, the motor vehicle and at least one other simulated road user, the other simulated road user being simulated by an algorithm that was trained according to the method described above, c) providing a mission for the motor vehicle, d) performing a simulation of the mission. In this way, it is possible not only to design your own motor vehicle that does not conform to behavior, but also other motor vehicles, which increases the robustness of the algorithm.

The algorithm can also be applied to other road users, for example pedestrians or cyclists, in which case no computer program product module for an automated or autonomous driving function, but a computer program product module for a movement behavior simulation is used. As a result, these agents can be designed more realistically and then used in future training missions of the type described above, which increases the quality of the simulation.

In a further refinement, provision can be made for the computer program product module to be integrated in a control unit of a motor vehicle and for the algorithm to be tested and / or trained in the real-world application.

In this way, influences of the real motor vehicle, which possibly cannot be fully simulated, can be taken into account. A motor vehicle that is actually moving can react differently than it is simulated.

A first independent subject relates to a device for training at least one algorithm for a control unit of a motor vehicle, the control unit being provided for implementing an automated or autonomous driving function by intervening in aggregates of the motor vehicle on the basis of input data using the at least one algorithm, the algorithm being trained by a self-learning neural network, the following being provided: a) means for providing a computer program product module for the automated or autonomous driving function, the computer program product module containing the algorithm to be trained and the self-learning neural network, b) means to provide a simulation environment with simulation parameters, the simulation environment map data of a real area of use, the motor vehicle and at least one other si- contains mulated road user, a behavior of the motor vehicle and the at least one other road user is determined by a rule set, the rule set containing permissible limits defining behavior parameters, c) means for providing a mission for the motor vehicle, d) means for modifying at least a behavior parameter of the motor vehicle, so that the at least one behavior parameter lies beyond the permissible limits, e) means for performing a simulation of the mission.

In a first further refinement, it can be provided that the neural network has means for learning through reinforcing learning processes, with at least one of the time to complete the mission and / or the number of accidents in which the motor vehicle is involved as the reward metric, during the mission, means are provided for repeating the simulation so that the simulation is repeated until a minimum metric is reached.

In a further refinement, it can be provided that the at least one abnormal road user is a motor vehicle, motorcycle or pedestrian.

In a further refinement, provision can be made for means to be provided for exceeding or falling below the at least one behavior parameter by a predetermined percentage.

In a further refinement, provision can be made for means to be provided for repeating the simulation several times, in which at least one simulation parameter is changed in each case.

In a further refinement, provision can be made for means for varying the simulation environment to be provided.

In a further refinement, it can be provided that means for Varying at least one behavior parameter are provided.

In a further refinement, provision can be made for means for varying the number, positioning and / or missions of the other road users to be provided.

In a further refinement, it can be provided that means are provided for training the algorithm by a self-learning neural network, the following being provided: a) means for providing the computer program product module for the automated or autonomous driving function, the computer program product module providing the algorithm to be trained and the self-learning neural network contains, b) means for providing a simulation environment with simulation parameters, wherein the simulation environment contains map data of a real existing operational area, the motor vehicle and at least one other simulated road user, the other simulated road user being simulated by an algorithm, which has been trained according to the method described above, c) means for providing a mission for the motor vehicle, d) means for performing a simulation of the mission.

In a further refinement, provision can be made for the computer program product module to be integrated in a control unit of a motor vehicle, and means for testing and / or training the algorithm in a real area of use being provided.

Another independent subject matter relates to a computer program product with a computer-readable storage medium on which instructions are embedded which, when executed by at least one processing unit, have the effect that the at least one processing unit is set up to execute the method of the type described above.

The method can be carried out distributed over one or more processing units, so that certain method steps are carried out on the one processing unit and other process steps are carried out on at least one further processing unit, with calculated data can be transmitted between the processing units if necessary.

Another independent subject matter relates to a motor vehicle with a computer program product of the type described above.

Further features and details emerge from the following description in which - if necessary with reference to the drawing - at least one Ausführungsbei game is described in detail. Described and / or graphically represented features form the subject matter individually or in any meaningful combination, possibly also independently of the claims, and in particular can also be the subject matter of one or more separate applications. Identical, similar and / or functionally identical parts are provided with the same reference numerals. They show schematically:

Fig. 1 shows a motor vehicle which is set up for automated or autonomous driving;

FIG. 2 shows a computer program product for the motor vehicle from FIG. 1; FIG.

FIG. 3 shows a simulation environment for the motor vehicle from FIG. 1, and FIG. 4 shows a flow chart of the method.

Fig. 1 shows a motor vehicle 2, which is set up for automated or autonomous driving.

The motor vehicle 2 has a control device 4 with a computing unit 6 and a memory 8. A computer program product is stored in the memory 8 and is described in more detail below in connection with FIGS.

The control unit 4 is connected, on the one hand, to a number of environmental sensors which allow the current position of the motor vehicle 2 and the respective traffic situation to be recorded. These include environmental sensors 10, 11 at the front of the motor vehicle 2, environmental sensors 12, 13 at the rear of the motor vehicle 2, a camera 14 and a GPS module 16. The environmental sensors 10 to 13 can, for example, radar, lidar and / or ultrasonic sensors.

Furthermore, sensors for detecting the state of the motor vehicle 2 are provided, including wheel speed sensors 16, acceleration sensors 18 and pedal sensors 20, which are connected to the control unit 4. With the aid of this motor vehicle sensor system, the current state of motor vehicle 2 can be reliably detected.

During the operation of the motor vehicle 2, the computing unit 6 has loaded the computer program product stored in the memory 8 and executes it. On the basis of an algorithm and the input signals, the computing unit 6 decides on the control of the motor vehicle 2, which the computing unit 6 would achieve by intervening in the steering 22, engine control 24 and brakes 26, which are each connected to the control unit 4.

Data from sensors 10 to 20 are continuously temporarily stored in memory 8 and discarded after a predetermined period of time so that these environmental data can be available for further evaluation.

The algorithm was trained according to the method described below.

2 shows a computer program product 28 with a computer program product module 30.

The computer program product module 30 has a self-learning neural network 32 that trains an algorithm 34. The self-learning neural network 32 learns according to methods of reinforcement learning, d. H. By varying the algorithm 34, the neural network 32 tries to obtain rewards for improved behavior in accordance with one or more metrics or standards, that is to say for improvements to the algorithm 34. Alternatively, known learning methods of supervised and unsupervised learning as well as combinations of these learning methods can be used.

The algorithm 34 can essentially consist of a complex filter with a matrix of values, usually called weights by those skilled in the art, that define a filter function that determines the behavior of the algorithm 34 as a function of input variables that are presently recorded by the environmental sensors 10 to 20 are determined and control signals for controlling the motor vehicle 2 are generated. The computer program product module 30 can be used both in the motor vehicle 2 and outside the motor vehicle 2. It is thus possible to train the computer program product module 30 both in a real environment and in a simulation environment. In particular, according to the teaching described here, the training begins in a simulation environment, since this is safer than training in a real environment.

The computer program product module 30 is set up to set up a metric that is to be improved. Such a metric can, for example, be a time until reaching a predetermined mission, for example a destination. If the metric has exceeded a certain threshold, e.g. a time less than a limit time, the metric can be considered fulfilled and the algorithm can be frozen in this regard. It can then either be optimized with regard to another metric and trained further, or the algorithm can be tested in a real environment.

FIG. 3 shows a simulation environment 36 for motor vehicle 2 from FIG. 1.

A road intersection 38 is provided in the simulation environment 36, at which a road 40 and a road 42 intersect. The intersection 38 is based on real existing map data, so that the behavior of the algorithm 34 at this intersection 38 is simulated specifically.

A motor vehicle 44 is parked at the edge of the road 40 in such a way that it is not possible to drive past without crossing a solid line 46. At the same time, according to the simulation, a motorcyclist 48 would like to turn from road 42 into road 40. In addition, a pedestrian 50 moves without paying attention to the traffic at high speed in the direction of movement 52 towards the street 40 in order to apparently cross it.

In the situation in question, the algorithm 34 has to make a large number of complex decisions. The first decision that has to be made is whether the solid line 46 may be crossed at all. Since it is not possible to pass the parked motor vehicle 44 without crossing the solid line 46, the decision of the algorithm 34 will have to be answered with yes, but the question arises as to the driving parameters with which. For this purpose, the algorithm 34 must make a prediction of how the motorcyclist 48 will behave, possibly on his normal trajectory would come relatively close to the motor vehicle 2. In everyday life, however, it is often the case that corresponding motorcyclists can easily evade or drive further to the right in their lane due to the small width of the motorcycle and the low speed in the intersection area.

Furthermore, the speed of the motor vehicle 2 must be taken into account. If the motor vehicle 2 accelerates slightly in order to overtake the parked motor vehicle 44, the probability that the motor vehicle 2 will impair the planned trajectory of the motorcyclist 48 is reduced. However, this could lead to the motor vehicle 2 crossing the trajectory of the possibly inattentive pedestrian 50 who is just about to cross the street 40, which could result in an accident.

In successive iterations, the algorithm 34 could therefore first try to pass the motor vehicle 44 without stopping. For this purpose, the motor vehicle 2 could first of all increase its speed above the permitted maximum speed in order to pass the motor vehicle 44. However, this could lead to a minimum distance between the pedestrian 50 and the motor vehicle 2 being undershot.

In a subsequent interaction, the algorithm 34 could move the motor vehicle 2 more slowly, which, however, could pose a danger to the motorcyclist 48.

The algorithm 34 could then initially accelerate the motor vehicle 2 to pass the parked motor vehicle 44 and then brake it again. This solution is to be preferred because on the one hand it enables the parked motor vehicle 44 to be passed and the present mission to be completed quickly and, on the other hand, it optimizes the metrics of the endangerment of other road users 48, 50.

Optimization with regard to other criteria and metrics can then take place in order to further increase the degree of maturity of the algorithm 34.

4 shows a flow chart of the method.

First, after the start, the computer program product module is made available. The computer program product module contains the algorithm to be trained and a self-learning neural network. A simulation environment is then made available on the basis of real map data. In addition to roads and certain rules, the simulation environment can also contain other road users and their missions.

On the basis of a basic algorithm, a set of rules for the ego vehicle can be varied, which contains rules of behavior, for example maintaining speeds, driving over solid lines, position in the lane, etc.

The simulation can then be carried out, using reinforcement learning methods to attempt to achieve individual metrics. As long as this is not the case, the strategy or the algorithm is varied and the simulation is carried out again until a certain individual metric is reached. This process is repeated for all metrics.

As soon as all metrics have been reached, the rule set of the ego vehicle is varied and the process is repeated until the algorithm has matured sufficiently. The algorithm can then be frozen.

The algorithm can be used, for example, in traffic simulations for simulated vehicles other than that of the motor vehicle to be trained. The method can also be applied to other road users.

The training can be continued in a real environment that is fully or mixed real.

Although the subject matter was illustrated and explained in more detail by means of exemplary embodiments, the invention is not restricted by the examples disclosed and other variations can be derived therefrom by the person skilled in the art. It is therefore clear that there is a multitude of possible variations. It is also clear that the exemplary embodiments mentioned are only examples that are not to be understood in any way as a limitation, for example, of the scope of protection, the possible applications or the configuration of the invention. Rather, the preceding description and the description of the figures enable the person skilled in the art to specifically implement the exemplary embodiments, with the person skilled in the art knowing the inventive concept disclosed various changes, for example with regard to the function or the arrangement of individual elements mentioned in an exemplary embodiment without departing from the scope of protection which is defined by the claims and their legal equivalents, such as a more detailed explanation in the description.

List of reference symbols

2 motor vehicle

4 control unit

6 arithmetic unit

8 memories

10 environmental sensor

11 Environment sensor

12 environmental sensor

13 Environment sensor

14 camera

15 GPS module

16 wheel speed sensor

18 accelerometer

20 pedal sensor

22 Steering

24 Motor control

26 brakes

28 Computer program product

30 computer program product module

32 neural network

34 algorithm

36 Simulation environment

38 intersection

40, 42 street

44 parked motor vehicle

46 solid line

48 motorcyclists

50 pedestrians

52 Direction of movement of the pedestrian 50

54 planned trajectory

Claims

1. A method for training at least one algorithm (34) for a control unit (4) of a motor vehicle (2), the control unit (4) for implementing an automatized or autonomous driving function with intervention in units (22, 24, 26) of the Motor vehicle (2) is provided on the basis of input data using the at least one algorithm (34), the algorithm (34) being trained by a self-learning neural network (32), comprising the following steps: a) Providing a computer program product module (30) for the automated or autonomous driving function, the computer program product module (30) containing the algorithm (34) to be trained and the self-learning neural network (32), b) providing a simulation environment (36) with simulation parameters, the simulation environment (36) being map data (38) of a real existing operational area, the motor vehicle (2) and at least one other simulated traffic participant (48, 50) contains, wherein a behavior of the motor vehicle (2) and of the at least one other road user (48, 50) is determined by a rule set, the rule set containing permissible limits-determining behavior parameters, c) providing a mission for the motor vehicle (2), d) modifying at least one behavior parameter of the motor vehicle (2) so that the at least one behavior parameter lies beyond the permissible limits, e) performing a simulation of the mission.

2. The method according to claim 1, wherein the neural network (32) learns by reinforcing learning method, with at least one of the time to fulfill the mission and / or number of accidents in which the motor vehicle (2) is involved as a reward metric, serves during the mission, with the simulation repeated until a minimum metric is reached.

3. The method of claim 1 or 2, wherein the at least one abnormal traffic participant is a motor vehicle (2), motorcyclist (48) or pedestrian (50).

4. The method according to any one of the preceding claims, wherein the computer program product module (30) has an algorithm (34) which has already been pre-trained with rule-compliant road users (48, 50).

5. The method according to any one of the preceding claims, wherein the at least one behavior parameter is exceeded or undercut by a predetermined percentage.

6. The method according to any one of the preceding claims, wherein the simulation is repeated several times, with at least one simulation parameter being changed in each case.

7. The method of claim 6, wherein the simulation environment (36) is varied.

8. The method according to claim 6 or 7, wherein at least one behavior parameter is varied.

9. The method according to any one of claims 6 to 8, wherein the number, positioning and / or missions of the other road users (48, 50) is varied.

10. The method according to any one of the preceding claims, wherein the algorithm (34) is trained by a self-learning neural network (32), comprising the following steps: a) providing the computer program product module (30) for the automated or autonomous driving function, the computer program product module (30) contains the algorithm (34) to be trained and the self-learning neural network (32), b) providing a simulation environment (36) with simulation parameters, the simulation environment (36) being map data of a real application area (38), the motor vehicle ( 2) and at least one further simulated traffic participant (48, 50), the further simulated traffic participant (48, 50) is simulated by an algorithm which was trained according to one of claims 1 to 9, c) providing a mission for the motor vehicle (2), e) performing a simulation of the mission.

11. The method according to any one of the preceding claims, wherein the computer program product module (30) in a control unit (4) of a motor vehicle (2) is inte grated and wherein the algorithm (34) is tested and / or trained in the real existing application area (38) .

12. Computer program product with a computer-readable storage medium (8) on which instructions are embedded which, when executed by at least one processing unit (6), cause the at least one processing unit (6) to be directed to the method according to one of the preceding claims auszufüh ren.

13. Motor vehicle with a computer program product according to claim 12.