WO2022263079A1

WO2022263079A1 - Method for predicting a behaviour of road users

Info

Publication number: WO2022263079A1
Application number: PCT/EP2022/063249
Authority: WO
Inventors: Steven Peters; Christian Drescher
Original assignee: Mercedes-Benz Group AG
Priority date: 2021-06-18
Filing date: 2022-05-17
Publication date: 2022-12-22
Also published as: DE102021003159A1

Abstract

The invention relates to a method for predicting a behaviour of road users in a vehicle environment of a vehicle (2), wherein: video sequences (VS) of the vehicle environment are captured by means of a sensor system; the behaviour of road users that are identified in the video sequences (VS) is predicted into the future by means of a neural network (NN2) on the basis of the video sequences; the neural network (NN2) is updated at each time step with data from a further neural network (NN1); the further neural network (NN1) is trained at each time step using video sequences (VS_t-x-y...t-x) of the vehicle environment that have been captured in previous time steps and have been temporarily stored in a circular buffer (5).

Description

Procedure for predicting the behavior of road users

The invention relates to a method for predicting the behavior of road users in a vehicle environment.

Among others in "S. Oprea, et al.: A Review on Deep Learning Techniques for Video Prediction; In IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. no. 01, pp. 1-1, 5555; doi: 10.1109/ TPAMI.2020.3045007" describes that video frames can be predicted using machine learning.

The invention is based on the object of specifying a novel method for predicting the behavior of road users in a vehicle environment.

The object is achieved according to the invention by a method which has the features specified in claim 1.

Advantageous configurations of the invention are the subject matter of the dependent claims.

In a method according to the invention for predicting the behavior of road users in the vehicle environment of a vehicle, video sequences of the vehicle environment are recorded using a sensor system, the behavior of road users identified in the video sequences being predicted using a neural network based on the video sequences. According to the invention, the neural network is updated in each time step with the data of a further neural network, the further neural network being trained in each time step with video sequences of the vehicle surroundings that were recorded in previous time steps and temporarily stored in a ring memory. According to the invention, the behavior of the road users is predicted by means of a neural network based on environmental data recorded by sensors (for example video frames).

If a dangerous situation is identified on the basis of the predicted behavior, a driver warning is preferably issued and/or a driving maneuver is carried out to minimize the dangerous situation.

Exemplary embodiments of the invention are explained in more detail below with reference to drawings.

show:

1 shows a schematic block diagram of a device for predicting the behavior of road users in a vehicle environment,

Fig. 2 shows a schematic detail of a driver's field of vision with an augmented display of information,

3 shows a schematic of the section of the driver's field of vision according to FIG. 2 with a further augmented representation of information, and

FIG. 4 shows a schematic of the section of the driver's field of vision according to FIG. 2 with a further augmented representation of information.

Corresponding parts are provided with the same reference symbols in all figures.

In a method according to the invention for predicting the behavior of road users in a vehicle environment, video sequences of the vehicle environment are recorded in a vehicle using at least one camera and/or at least one lidar sensor and/or at least one radar sensor. A neural network that has already been pre-trained is trained further at any point in time with video sequences from previous time segments or time units. The trained network can then be used to make predictions about the near future. These predictions are in the form of frames, ie images, and are evaluated. Will in one of the frames recognizes a risk of collision, a warning is issued for a driver of the vehicle.

For example, the neural network can be trained with a video sequence that shows the swaying of a trailer of a vehicle to be overtaken, for example on a freeway. In the next few frames, the neural network then predicts that the trailer will swing into its own lane and the driver is warned accordingly.

Other typical examples follow, which are particularly suitable for pre-training: A vehicle pulls from an acceleration lane to the far left across two lanes.

A child follows a rolling ball.

A cyclist stretches out an arm and turns.

The driver is supported and relieved by the described method and thus an increase in safety is achieved.

Figure 1 shows a schematic block diagram of a device 1 for predicting the behavior of road users in a vehicle environment, comprising a vehicle 2 and an OEM backend 3. The vehicle 2 comprises at least one camera 4, a ring memory 5 for the camera 4 detected Video sequences VS of the vehicle environment, a first control unit SG1, a second control unit SG2, a third control unit SG3 and a fourth control unit SG4.

The first control unit SG1 receives a time-delayed video sequence VS _{txy .. tx} from the ring memory 5 and trains a first neural network NN1 at each point in time t and thus overwrites a second neural network NN2 in the second control unit SG2 after each time step. The following applies to the training: The input vector is the video sequence VS _{txy .. tx} of images from the camera 4 from the ring memory 5 over a period of time from txy to tx. The output vector (so-called label) is a video sequence VS _{tx .. t} of images from the camera 4 over a period of time from tx to t.

The second control unit SG2 uses the second neural network NN2 to predict a video sequence VS _{t .. t+x} for a future time period t to t+x based on a video sequence VS _{ty .. t} of images from the camera 4 of the time period ty to t.

The third control unit SG3 uses a third neural network NN3 to carry out an object classification on the frames of the video sequence VS _{t . . t+x} from the second Control unit SG2 for the future time period t to t+x and checks whether a relevant object (for example a truck, a car, a pedestrian...) collides with a current and/or planned travel trajectory in at least one frame. If this is the case, then the fourth control unit SG 4 is prompted to issue a warning.

In one embodiment, the first and second control units SG1 and SG2 use informer and/or transformer neural networks to generate the video sequence VS _{t .. t+x} specified time interval back to an initial status, which corresponds either to a neural network NNO with a pre-trained status from the development of the vehicle or to a neural network NN* continuously trained in the OEM backend 3 from field data, for example video sequences collected from customer vehicles.

In one embodiment, the third control unit SG3 uses the semantic segmentation on the video sequence VS _{t . . . t+c} generated by the second control unit SG2 for the future time period t to t+x by means of neural networks for object classification. In this way, road users are recognized as objects and each is assigned to a predefined object class. Furthermore, the third control unit SG3 can validate the video sequence VS _{t .. t+c} generated by the second control unit SG2 for the future time period t to t+x by comparing the movements of the objects with predefined, possible movements per object class. For example, a truck cannot suddenly reverse.

In one embodiment, the fourth control unit SG4 can also receive the relevant object and the predicted frame with the collision and its time stamp from the third control unit SG3. This would make it possible, for example, to use an augmented reality head-up display in vehicle 2 to visually highlight and/or mark the object for the driver and, if necessary, to display the predicted frame with the risk of collision as a transparent overlay, so that the driver can identify a potential risk. Provision can also be made to illuminate the object with which there is a risk of collision with high-resolution headlight systems such as digital light, particularly in the dark. In addition, a driver observation camera can be used to check whether the driver already has the object in view, so that a warning is not necessary. Provision can also be made to plan an evasive maneuver and/or a braking maneuver if the driver does not react in good time. In an alternative embodiment, instead of a realistic image, a semantic segmentation of the future frames can be directly predicted to reduce the training effort, the resolution of which can be lower. For this purpose, the algorithm for the semantic segmentation would have to be carried out as preparation before the use of the neural networks NN1 and NN2 and the neural networks NNO, NN1, NN2 would have to have been trained with semantically segmented images and/or video sequences. Only the collision detection would then take place in the third control unit SG3.

FIG. 2 is a schematic illustration of a section of a driver's field of vision with an augmented representation of information, with a predicted direction of movement R of a vehicle V driving ahead being displayed.

FIG. 3 is a schematic illustration of the section of the driver's field of vision according to FIG. 2 with a further augmented representation of information, the vehicle V driving ahead being illuminated with digital light DL.

FIG. 4 is a schematic view of the section of the driver's field of vision according to FIG. 2 with a further augmented representation of information, wherein a frame F predicted for the future with a predicted position of the vehicle driving ahead V is displayed by overlay.

Claims

patent claims

1. Method for predicting the behavior of road users in a vehicle environment of a vehicle (2), video sequences (VS) of the vehicle environment being recorded using a sensor system, the behavior of road users identified in the video sequences (VS) being recorded using a neural network (NN2) is predicted on the basis of the video sequences, characterized in that the neural network (NN2) is updated with the data of a further neural network (NN1) in each time step, the further neural network (NN1) being updated with video sequences (VS _{txy.. tx} ) of the vehicle environment is trained, which were recorded in previous time steps and temporarily stored in a ring memory (5).

2. The method as claimed in claim 1, characterized in that if a dangerous situation is identified on the basis of the predicted behavior, a warning is issued in the vehicle (2) and/or a driving maneuver is carried out to minimize the dangerous situation.

3. The method as claimed in one of claims 1 or 2, characterized in that an informer and/or transformer neural network is used as the neural network (NN2) and/or as the further neural network (NN1, NNO, NN*). will.

4. The method according to any one of the preceding claims, characterized in that the further neural network (NN1) is reset to an initial status, which corresponds to either a neural network (NNO) with a pre-trained status from the development of the vehicle (2) or a corresponds to a neural network (NN*) continuously trained from field data in an OEM backend (3).

5. The method as claimed in claim 4, characterized in that the reset of the further neural network (NN1) to the initial status takes place regularly.

6. The method as claimed in claim 4 or 5, characterized in that the further neural network (NN1) is reset to the initial status at the start of the journey and/or after a specified time interval.

7. The method as claimed in one of the preceding claims, characterized in that the road users in the video sequences (VS) are recognized by means of a neural network (NN3) for object classification, with semantic segmentation being used.

8. The method as claimed in claim 7, characterized in that the predicted behavior of the road users is checked for plausibility by the movements of the road users being compared with predefined, possible movements of a road user of the recognized object class.

9. The method according to any one of the preceding claims, characterized in that in an augmented reality head-up display of the vehicle (2) a direction of movement (R) of the identified road user and optionally a highlighting of the identified road user and / or a transparent overlay of the detected road user is displayed at a predicted position.

10. The method according to any one of claims 2 to 9, characterized in that it is checked with a driver observation camera whether the driver already has the object in view, and that in this case no warning is given.