CN116052102A

CN116052102A - Vehicle track prediction method, automatic driving system, vehicle and storage medium

Info

Publication number: CN116052102A
Application number: CN202310001941.6A
Authority: CN
Inventors: 严旭
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-05-02

Abstract

The invention relates to the technical field of automatic driving of vehicles, and provides a vehicle track prediction method, an automatic driving system, a vehicle and a storage medium, wherein image data of the vehicle are firstly obtained, and a bird's eye view of the vehicle is generated according to the image data of the vehicle; then, carrying out track prediction on the vehicle by using a random time residual error updating model algorithm; when the bird's eye view perspective feature map is generated, firstly, processing each camera at the moment t by using an encoder to obtain image features and depth probability; combining the image features with the depth probability to form three-dimensional features; and finally, projecting the three-dimensional feature in the vertical dimension to a plane with a certain area to form a bird's eye view feature map around the vehicle. The invention can process each camera image more compactly, so that each camera image is more closely related, and the invention does not depend on a high-precision map with high cost, and has lower cost.

Description

Vehicle track prediction method, automatic driving system, vehicle and storage medium

Technical Field

The invention relates to the technical field of automatic driving of vehicles, in particular to a vehicle track prediction method, an automatic driving system, a vehicle and a storage medium.

Background

With the vigorous development of artificial intelligence technology, the automatic driving technology is highly focused, and vehicles with automatic driving functions are favored. The lane change behavior can occur at any time in the automatic driving process, including lane change of the own vehicle and lane change of the peripheral vehicle, if the motion track of the peripheral vehicle can be predicted in advance, accurate control signals can be provided for the own vehicle for the lane change of the own vehicle, and strain measures can be provided for the own vehicle in advance for the lane change of the peripheral vehicle, especially for the abrupt change of other vehicles from other lanes to the own lane, so that accurate information can be provided; thus, accurate vehicle trajectory prediction is one of the key steps in achieving full-automatic driving.

In the prior art, a patent with publication number CN114021080a discloses a track prediction model training and track prediction method, device, equipment and medium, and the track prediction model is used for processing historical track data of a sample vehicle and historical track data of surrounding vehicles of the sample vehicle to obtain predicted track data of the sample vehicle and predicted driving behavior data of the sample vehicle; determining a loss function according to the predicted track data of the sample vehicle, the real track data of the sample vehicle and the predicted driving behavior data of the sample vehicle; and training the track prediction model according to the loss function.

According to the technical scheme, although the accuracy of the track prediction model trained by the method is higher than that of the track prediction model trained by the prior art, more accurate track prediction of the automatic driving vehicle based on the track prediction model is realized; however, in the technical scheme, the input data of the track prediction model are images shot by each camera, and the images are not integrated, so that the images of each camera are not closely connected and rely on a high-precision map with high cost; in the technical scheme, the traditional LSTM algorithm is used for encoding input data, the pooling layer and the convolutional neural network are used for carrying out data intermediate processing, and finally the decoder network is used for decoding to obtain the track prediction of the target, so that the accuracy of the track prediction of the method is reduced along with the time due to various uncertainties in a real scene.

Disclosure of Invention

In view of the foregoing, an object of the embodiments of the present application is to provide a vehicle track prediction method, an automatic driving system, a vehicle and a storage medium, which can process each camera image more compactly, so that each camera image is more closely related, and the cost is lower without depending on a high-precision map with high cost.

In order to achieve the technical purpose, the technical scheme adopted by the application is as follows:

in a first aspect, the present application provides a vehicle track prediction method, firstly acquiring image data of a vehicle, and generating a bird's eye view of the vehicle according to the image data of the vehicle; and then, carrying out track prediction on the vehicle by using a random time residual error updating model algorithm.

Further, the image data of the vehicle is acquired by a plurality of cameras arranged on the vehicle body.

Further, the number of cameras is six, and six cameras are respectively arranged at the front, left front, right front, front rear, left rear and right rear positions of the vehicle.

Further, the method for generating the aerial view of the vehicle is to obtain a set of available features and a set of discrete depth probabilities from 6 different images at the same time by using a standard convolutional encoder E, specifically comprising the following steps,

a1, definition

Images of six different cameras at time t;

a2, using an encoder:

for every image->

Coding; wherein C is each sheetThe number of channels of the image, D is the number of discrete depth values per image, (H) _e ，W _e ) For the spatial resolution dimension of the feature, the value of D is equal to the value of D _min To D _max According to D _size Number of equally spaced divisions =1.0m, D _min And D _max Respectively coding the minimum depth value and the maximum depth value;

a3, will

Is divided into two parts, namely->

Wherein->

Performing an outer product operation using the feature and depth probabilities to obtain a vector +.>

The expression is +.>

(1) Thereby obtaining 3D features;

in this step, the depth probability is used as a form of self-attention mechanism, the plane can be adjusted according to the predicted depth, and the camera vectors are referenced to the same frame data by using the known internal and external parameters of the camera

Lifting to three-dimensional space (3D).

A4, obtaining aerial view characteristics: taking a vehicle as a center, meshing the surrounding 100 m-100 m space with 0.5 m-0.5 m intervals to establish a bird's-eye view, and then projecting the 3D features obtained in the step A3 into the bird's-eye view in a vertical dimension in a weighted average manner to form a feature map x of the bird's-eye view angle _t ∈R ^C*H*W Where (H, W) = (200 ), i.e. the last acquired bird's eye view.

Further, the method for predicting the track of the vehicle by using the random time residual error update model algorithm comprises the following steps:

b1, encoder h ₀ Processing the high-resolution BEV (bird's eye view) state to obtain a low-resolution feature space with 50 x 50 resolution;

b2, deducing a first potential variable y for the first three coding states by using a convolutional neural network ₁ ；

B3, using a cyclic neural network formed by combining ConvGRU and convolution blocks to convert the random latent variable z _t Deducing from the respective coding state during each time period;

b4, based on state dynamics y _t Random latent variable z _t+1 By f _θ The function predicts the residual variation in dynamics and adds it to y _t To obtain y _t+1 ；

B5, from each y _t Status of

In g _θ Predicting the original resolution of (a);

b6, prediction from State

And decoding prediction track +.>

Further, the obtaining y _t+1 Comprises the following steps:

c1, sampling from a normal distribution learned from the latent variable of the previous state, from a random latent variable z _t+1 Introducing randomness, the formula is:

z _t+1 ～N( _μ θ(y _t )，σ _θ (y _t )I) (2)

c2, given z _t+1 Determination of latent variable y by residual updating _t And y _t+1 The dependency relationship between the two is expressed as:

y _t+1 ＝y _t +f _θ (y _t ，z _t+1 ) (3)

wherein f _θ Is a small CNN (convolutional neural network learning) for learning pair y _t Residual updating of (a); learning the distribution of future trajectories from the corresponding latent variables as a normal distribution with constant diagonal variance:

the initial latent variables are inferred by assuming standard gaussian prior conditions: y is ₁ ～N(0，I)。

Further, in the decoding prediction track

At the time, the bird's eye view state s _1：T Output mode o _t Latent variable z _1：T And y _1：T The joint probabilities of (a) are:

p(z _t ，y _t |y _t-1 )＝p(y _t |y _t-1 ，z _t )p(z _t |y _t-1 ) (5)

wherein p (y _t |y _t-1 ，z _t ) Y in (a) _t And y _t-1 Is determined by the random latent residual described in equation (3), p (o) in equation (4) _t |s _t ) Output modality o of (2) _t Is performed in a supervised manner by deterministic decoders from s _t Learning to obtain;

from p(s) _1：T ，o _1：T ) Maximizing the state probability of the aerial view in the corresponding output mode, learning a depth variation reasoning model q, wherein the reasoning model q consists of

Parameterization, decomposition as follows:

where k is the number of conditional prior frames, as residual updates q (y _t |y _t-1 ，z _t ) And p (y) _t |y _t-1 ，z _t ) Equality by holding or removing z _t To o _1：t Is used to obtain two versions of the model.

In a second aspect, the invention also discloses an automatic driving system, which uses the vehicle track prediction method.

In a third aspect, the invention also discloses a vehicle, which comprises a vehicle body and the automatic driving system, wherein the automatic driving system is mounted on the vehicle body.

In a fourth aspect, the present invention also discloses a computer readable storage medium, in which a computer program is stored, which when run on a computer causes the computer to perform the above-mentioned method.

The invention adopting the technical scheme has the following advantages:

1. when the bird's eye view perspective feature map is generated, firstly, each camera at the moment t is processed by using an encoder to obtain image features and depth probability; combining the image features with the depth probability to form three-dimensional features; and finally, projecting the three-dimensional feature in the vertical dimension to a plane with a certain area to form a bird's eye view feature map around the vehicle. According to the method, the camera images around the vehicle are processed, the characteristic diagram of the bird's eye view is generated, so that the camera images are more closely related, and the accuracy of vehicle track prediction is higher.

2. The invention uses a random time residual error update model to solve the uncertainty in the real scene, thereby improving the track prediction accuracy, the model learns the time dynamics in the potential space by carrying out random residual error update on each time period, samples the learned distribution in each time period, can obtain more accurate track prediction in a wider space area and a longer time range, and the model can rapidly carry out reasoning calculation by decoupling the dynamic learning and the generation of the track prediction, does not depend on a high-precision map with high cost, and has lower vehicle track prediction cost.

Drawings

The present application may be further illustrated by the non-limiting examples given in the accompanying drawings. It is to be understood that the following drawings illustrate only certain embodiments of the present application and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may derive other relevant drawings from the drawings without inventive effort.

Fig. 1 is an illustration of an original camera image of the present invention.

Fig. 2 is an example of a bird's eye view generated by the present invention.

Fig. 3 is a flowchart of the steps for generating a bird's eye view of a vehicle in accordance with the present invention.

FIG. 4 is a block flow diagram of a model of a trajectory prediction algorithm in accordance with the present invention.

Detailed Description

The present application will be described in detail below with reference to the drawings and the specific embodiments, and it should be noted that in the drawings or the description of the specification, similar or identical parts use the same reference numerals, and implementations not shown or described in the drawings are in forms known to those of ordinary skill in the art. In the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Example 1,

The embodiment is a vehicle track prediction method of a random time residual error update model, which comprises two steps of generating a bird's-eye view angle feature map and performing track prediction by the random residual error update model, and specifically comprises the following steps:

1. generation of bird's eye view perspective feature map

Firstly, 6 cameras are required to be installed around a vehicle, and the installation direction is as follows: after video images of 6 cameras are acquired, the features of each camera image are extracted for each past time period, and the features are fused into a bird's-eye view angle (BEV) to form a bird's-eye view.

The specific production mode is as follows: a standard convolutional encoder E is used to obtain a set of available features and a set of discrete depth probabilities for 6 different images at the same time.

1. Definition of the definition

Images of 6 different cameras at time t;

2. with an encoder:

for every image->

Encoding is performed. Wherein C is the number of channels per image, D is the number of discrete depth values per image, (H) _e ，W _e ) Is the spatial resolution size of the feature. The value of D is equal to the value of D _min To D _max According to D _size Number of equally spaced divisions =1.0m, D _min And D _max The minimum depth value and the maximum depth value are coded, respectively.

3. Will be

Is divided into two parts, namely->

Wherein->

The vector +.>

The following formula is shown:

depth probability, as a form of self-attention mechanism, may be characterized for a plane based on a predicted depth. These camera vectors are referenced to the same frame data using known camera internal and external parameters

Lifting to three-dimensional space (3D).

4. Obtaining aerial view characteristics: taking a vehicle as a center, meshing a space with 100m x 100m around with 0.5 x 0.5m intervals to establish a bird's-eye view, and then projecting the 3D features obtained in the first step into the bird's-eye view in a weighted average manner in a vertical dimension to form a feature map x of the bird's-eye view angle _t ∈R ^C*H*W Where (H, W) = (200 ), i.e. the last acquired bird's eye view.

2. Track prediction by random residual error update model

The overall step of trajectory prediction comprises the steps of:

the BEV state is still high resolution, where s _t ∈R ^C*H*W (H, W) = (200 ), therefore, first using one encoder

Processing the high-resolution BEV state to obtain a low-resolution feature space with 50 x 50 resolution;

2. deducing a first latent variable y for the first three coded states using convolutional neural networks ₁ ；

3. Random latent variable z using a recurrent neural network formed by the combination of ConvGRU and convolution blocks _t Deducing from the respective coding state during each time period;

4. based on previous state dynamics y _t Random latent variable z _t+1 By f _θ The function predicts the residual variation in dynamics and adds it to y _t To obtain y _t+1 。

5. From each y _t State of

In g _θ Is predicted from the original resolution of (c).

6. Finally, predicting from the state

Mid-decoding prediction track +.>

For the above steps, the trajectory of 4 to 12 steps in the future is predicted on the condition that k=3 in the present embodiment.

First use s _1：T Represents a series of BEV (bird's eye view) feature maps and is used to represent the state of the vehicle and its environment over a period T. In track prediction, the objective is to predict future track based on vehicle and environmental state information during the first k time periods

BEV states s input to a random prediction framework _t Is an intermediate representation in high-dimensional space, rather than a video frame in pixel space, framed out

Predicting the track in the same high-dimensional space and +.>

Output mode representing decoding as track prediction +.>

Knowledge of state changes over time by random residual updates to a series of potential variables _t All have a corresponding latent variable y _t To generate it independently of the previous state (as shown in figure 1). Each y _t+1 Dependent only on the previous y _t And a random variable z _t+1 . The randomness is determined by a random latent variable z _t+1 Introduced, it is sampled from a normal distribution learned from potential variables of the previous state, as shown in formula (2):

z _t+1 ～N(μ _θ (y _t )，σ _θ (y _t )I) (2)

given z _t+1 Latent variable y _t And y _t+1 The dependency relationship between the two is determined by residual updating, as shown in formula (3):

y _t+1 ＝y _t +f _θ (y _t ，z _t+1 ) (3)

wherein f _θ Is a small CNN (convolutional neural network, deep learning) for learning pair y _t Is updated with the residual error of (c). Learning the distribution of future trajectories from the corresponding latent variables as a normal distribution with constant diagonal variance:

the initial latent variables are deduced by assuming standard gaussian prior conditions: y is ₁ ～N(0，1)。

BEV State s _1：T Output mode o _t Potential variable z _1：T And y _1：T The joint probabilities of (a) are as follows:

p(z _t ，y _t |y _t-1 )＝p(y _t |y _t-1 ，z _t )p(z _t |y _t-1 ) (5)

in the formula (5), p (y) _t |y _t-1 ，z _t ) Y in (a) _t And y _t-1 Is determined by the random latent residual described in equation (3), p (o) in equation (4) _t |s _t ) Output modality o of (2) _t Is composed ofSupervision from s by deterministic decoder _t Learning to obtain.

Our goal is to get the signal from p (s _1：T ，o _1：T ) Maximizing BEV state probability in corresponding output patterns, we learn a depth variation inference model q, which consists of

Parameterization, decomposition as follows:

where k=3, the number of conditional prior frames, is used as residual update q (y _t |y _t-1 ，z _t ) And p (y) _t |y _t-1 ，z _t ) Equality by holding or removing z _t To o _1：t Two versions of the model can be obtained.

EXAMPLE 2,

The present embodiment is an automated driving system using the vehicle trajectory prediction method of embodiment 1 described above. The automatic driving system of the embodiment adopts a random time residual error updating model to solve uncertainty in a real scene, so that track prediction accuracy is improved, the model learns time dynamics in a potential space by carrying out random residual error updating on each time period, samples are taken from learned distribution in each time period, more accurate track prediction can be obtained in a wider space area and a longer time range, and the model can rapidly carry out reasoning calculation by decoupling dynamic learning and track prediction generation, so that the model does not depend on a high-precision map with high cost, and the vehicle track prediction cost is lower.

EXAMPLE 3,

The present embodiment is a vehicle including a vehicle body and the automated driving system of embodiment 2, the automated driving system being mounted on the vehicle.

EXAMPLE 4,

The present embodiment is a computer-readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the above-described method. From the foregoing description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented in hardware, or by means of software plus a necessary general hardware platform, and based on this understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disc, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a brake device, or a network device, etc.) to perform the methods described in the various implementation scenarios of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, system, and method may be implemented in other manners as well. The above-described apparatus, systems, and method embodiments are merely illustrative, for example, flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A vehicle trajectory prediction method, characterized by: firstly, acquiring image data of a vehicle, and generating a bird's eye view of the vehicle according to the image data of the vehicle; and then, carrying out track prediction on the vehicle by using a random time residual error updating model algorithm.

2. The vehicle trajectory prediction method according to claim 1, characterized in that: the image data of the vehicle is acquired by a plurality of cameras arranged on the vehicle body.

3. The vehicle trajectory prediction method according to claim 2, characterized in that: the number of the cameras is six, and the six cameras are respectively arranged at the front, the left front, the right front, the front rear, the left rear and the right rear positions of the vehicle.

4. A vehicle trajectory prediction method according to claim 3, characterized in that: the method for generating the aerial view of the vehicle is that 6 different images at the same time are used for obtaining a set of available features and a set of discrete depth probabilities by a standard convolution encoder E, and specifically comprises the following steps,

a1, definition

Images of six different cameras at time t;

a2, encoder

For every image->

Coding; wherein C is the number of channels per image, D is the number of discrete depth values per image, (H) _e ，W _e ) For the spatial resolution dimension of the feature, the value of D is equal to the value of D _min To D _max According to D _size Number of equally spaced divisions =1.0m, D _mim And D _max Respectively coding the minimum depth value and the maximum depth value;

a3, will

Is divided into two parts, namely->

Wherein->

Expressed as

Thereby obtaining 3D features;

5. The vehicle trajectory prediction method according to claim 4, characterized in that: the method for predicting the track of the vehicle by using the random time residual error updating model algorithm comprises the following steps:

b1, encoder

Processing the high-resolution aerial view state to obtain a low-resolution feature space with 50×50 resolution;

B5, from each y _t Status of

In g _θ Predicting the original resolution of (a);

b6, prediction from State

Mid-decoding prediction track +.>

6. The vehicle trajectory prediction method according to claim 5, characterized in that: obtaining said y _t+1 Comprises the following steps:

z _t+1 ～N(μ _θ (y _t )，σ _θ (y _t )I) (2)

c2, given z _t+1 Determination of latent variable y by residual updating _t And y _t+1 Dependency relationship betweenThe formula is:

y _t+1 ＝y _t +f _θ (y _t ，z _t+1 ) (3)

7. The vehicle trajectory prediction method according to claim 5, characterized in that: in decoding the predicted track

p(z _t ，y _t |y _t-1 )＝p(y _t |y _t-1 ，z _t )p(z _t |y _t-1 ) (5)

Parameterization, decomposition as follows:

8. An autopilot system characterized by: the automated driving system uses the vehicle trajectory prediction method of claim 1.

9. A vehicle, characterized in that: the vehicle includes a vehicle body and the autopilot system of claim 8, the autopilot system being mounted on the vehicle body.

10. A computer-readable storage medium, characterized by: the computer-readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to execute the vehicle trajectory prediction method of claim 1.