CN116304969A

CN116304969A - Vehicle track multi-mode prediction method considering road information based on LSTM-GNN

Info

Publication number: CN116304969A
Application number: CN202310076393.3A
Authority: CN
Inventors: 孟志伟; 张素民; 何睿; 支永帅; 杨志
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-01-28
Filing date: 2023-01-28
Publication date: 2023-06-23

Abstract

The invention provides a vehicle track multi-mode prediction method based on LSTM-GNN considering road information. The input module is the historical track of the target vehicle and the surrounding traffic vehicles, the encoder encodes the input historical track, the interaction characteristic extraction module is used for extracting interaction influence between vehicles, and the road information characteristic extraction module is used for extracting road structure information. The prediction method provided by the invention considers the influence of road structure information on the vehicle prediction track, adds Gaussian noise on the basis of fusion feature vectors, and introduces a diversity loss function. The method effectively improves the prediction precision of the future track and improves the social acceptability and rationality of the predicted track.

Description

Vehicle track multi-mode prediction method considering road information based on LSTM-GNN

Technical Field

The invention belongs to the technical field of intelligent driving, and particularly relates to a vehicle track multi-mode prediction method considering road information based on LSTM-GNN.

Background

Intelligent driving of automobiles is a strategic direction of global automobile industry development, and automobile intellectualization has become a trend of automobile industry development.

The prediction module plays a role in the up-down movement in the intelligent driving system, the upstream perception module inputs the perception fusion result into the prediction module, and then the result output by the prediction module provides input for the downstream decision planning module. The track prediction of the traffic vehicles around the main vehicle can improve the driving safety of the intelligent driving vehicle, and is beneficial to improving the rationality of the decision planning result of the main vehicle.

Because of the difference of input and intermediate processing steps of the prediction model, the current prediction method of the traffic track in the intelligent driving field can be roughly divided into three types: prediction methods based on physical constraints, prediction methods based on behavior, and prediction methods based on learning. The existing prediction method is less in consideration of the influence of road information on the future motion trail of the vehicle, and insufficient in utilization of priori knowledge; the movement of the vehicle is greatly affected by the road structure information, and if the restriction effect of the road structure information on the vehicle track is not considered, the future track with lower prediction accuracy is easily output. In addition, the motion trail of the vehicle is multi-modal due to certain uncertainty of the motion of the vehicle, but the prior study has less consideration on the motion trail; absent consideration of the multi-modal of the predicted trajectories, future trajectories generated by the predictive model may not conform to driving logic and rules.

Disclosure of Invention

The embodiment of the invention aims to provide a vehicle track multi-mode prediction method based on LSTM-GNN considering road information, so as to fully utilize the priori knowledge of the road information and multi-mode output of the track, and further improve the prediction precision.

In order to solve the technical problems, the technical scheme adopted by the invention is that the vehicle track multi-mode prediction method taking road information into consideration based on LSTM-GNN comprises the following steps:

s1: acquiring vehicle track information, extracting vehicle information in a vehicle track data set according to a time sequence, wherein the track of a target vehicle needs to have a 3s historical track and a 5s future track, the track length of the target vehicle is at least 1000 meters, and in addition, surrounding traffic vehicles need to have a 3s historical track;

s2: preprocessing the data obtained in the step S1:

s2.1, cleaning data, namely filtering the acquired data information to remove abnormal data, cleaning the acquired vehicle track by using a smoothing filter to remove incomplete data and supplement missing data;

s2.2, dividing a data set, sampling cleaned data by utilizing a sliding window, wherein each sample contains 80 frames of vehicle track information, wherein the first 30 frames are historical track information, the second 50 frames are future track information, and then dividing the data into a training set, a verification set and a test set according to a proportion;

and S2.3, map matching of the tracks, and if tracks which do not accord with the normal form of the vehicle exist, correcting abnormal track points.

S3: the encoder encodes the history track information, the encoder is composed of n+1 LSTM networks, the input of the encoder is a target vehicle history track and a surrounding vehicle history track, and the encoded vehicle dynamics feature vector is output; the specific steps for constructing the encoder are as follows:

s3.1: predicted vehicle v _p Is the historical track of (a)

Wherein->

For the predicted vehicle v _p At t ₁ Coordinates of time of day, use->

Representation, t ₁ E {1,2,3}; surrounding traffic vehicle v _i Is the historical track of (a)

Wherein->

For surrounding traffic v _i At t ₁ Coordinates of time of day, use->

A representation; surrounding traffic vehicle v _i ∈{v ₁ ，...，v _n -where n is the predicted vehicle v _i The number of surrounding traffic vehicles; predicted vehicle v _p Future predicted trajectory of +.>

Wherein->

For the predicted vehicle v _p At t ₂ Predicted coordinates of time of day, using

Representation, t ₂ E {4,5,6,7,8}; predicted vehicle v _p Is defined as +.>

Wherein->

For the predicted vehicle v _p At t ₂ Real coordinates of time of day, use ∈>

And (3) representing.

S3.2: embedding a vehicle v using a multi-layer perceptron MLP _j To obtain fixed length vector

Wherein v is _j ∈{v _p ，v ₁ ，...，v _n }；

Wherein, the liquid crystal display device comprises a liquid crystal display device,

is an embedded function with a ReLU nonlinear activation function, W _ee Is an embedded weight;

s3.3: vehicle v _j Historical track information

And a fixed length vector->

Input into encoder LSTM, get dynamics feature vector +.>

The encoding process is shown as follows:

wherein W is _encoder Is the weight of LSTM.

S4: extracting a road information feature vector LF based on CNN-LSTM, and realizing the encoding of road structure information by using a 1D-CNN and LSTM model; the process is as follows:

s4.1: defining a candidate lane according to the current position of a target vehicle, firstly searching a lane segment within 10 meters from the search radius of the centroid of the target vehicle, and then expanding the lane segment forwards and backwards until the length of a lane line reaches the required length;

s4.2: determining the lane in which the surrounding vehicles are located, and determining the lane in which the surrounding traffic vehicles are located

The observation information of (2) is sequentially input into 1D-CNN and LSTM for encoding, and the following formula is shown:

is lane +.1D-CNN and LSTM encoded>

Information feature vector->

Representing a surrounding traffic vehicle v _i The lane in which the vehicle is located;

s4.3: generating a lane information feature vector LF, using the attention weight ω _i Feature vector of coding information of lane where surrounding traffic vehicle is located

Fusion treatment was performed as shown in the following formula:

s5: encoding vehicle dynamics feature vectors

Spliced with the interactive feature vector IF between vehicles, then fused with the road information feature vector LF, then in +.>

Adding random noise z on the basis of the IF and LF fusion feature vectors, wherein Gaussian distribution mixed noise is adopted; the method for calculating the interaction feature vector IF between vehicles is as follows:

s5.1: the graph structure can be represented by g= (V, E), with nodes defined as v= { V _p ，v ₁ ，...，v _i ，...，v _n Edge definition E.epsilon.V.times.V; since the graph is a directed graph, node v _p And node v _i Edge and node v between _i And node v _p The edges between are different, edge E can be expressed as:

representing node v _p To node v _i Directed edge of (v), node v _i Immediately adjacent node v _p And node v _i The behavior of (a) affects node v _p Behavior of (2); />

Representing node v _i To node v _p Is a directed edge of (2);

s5.2: the interaction between vehicles is modeled using the graph neural network GNN, expressed by:

wherein IF represents a vehicle-to-vehicle interaction feature vector, GNN _inter Is an interactive feature encoder composed of two layers of GNN networks,

for the dynamics feature vector of the vehicle, < > is>

Indicated at t ₁ Edges of the time graph structure.

S6: decoding and outputting a multi-modal future track by utilizing an LSTM network, and inputting the fusion characteristic added with random noise z into a decoder of the LSTM network to generate the multi-modal future track;

s7: a multi-modal prediction model of a vehicle track based on LSTM-GNN considering road structure information is trained using a diversity loss function, k possible prediction tracks are generated by arbitrarily sampling a random noise in the distribution, and an "optimal" prediction track is selected according to the L2 Euclidean distance.

Firstly, inputting historical track information of a target vehicle and surrounding traffic vehicles into an encoder for encoding, and extracting interactive influence feature vectors between vehicles by an interactive feature extraction module, wherein a road information feature extraction module extracts road structure information feature vectors in a high-precision map; then, splicing the historical track feature vector, the vehicle interaction feature vector, the road information and the feature vector of the target vehicle, and adding random noise on the basis of the spliced feature vector; and finally, inputting the final fusion feature vector into a decoder, and decoding to generate a multi-mode output track.

The encoder consists of a long-term and short-term memory network LSTM and is used for encoding the dynamics characteristics of the vehicle, and the input track is encoded by the LSTM after the history track of the vehicle is input into the encoder; the decoder consists of a long-short-term memory network LSTM and is used for decoding the fusion characteristic vector to generate a multi-mode prediction track; the interaction feature extraction module is composed of a GNN network and is used for considering interaction influence of other vehicles on a target vehicle so as to extract interaction features between vehicles; the road information feature extraction module is composed of a CNN-LSTM network and is used for extracting road structure information features.

The vehicle track multi-mode prediction method based on LSTM-GNN and considering road information has the advantages that the road information feature extraction module based on CNN-LSTM can extract structural information of a lane where a vehicle is located, and track prediction accuracy is further improved. Random Gaussian noise is added into the fusion feature vector, and a diversity loss function is introduced in the model training process, so that a multi-mode sample track is generated, and the social acceptability and rationality of the track are improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of a method for predicting a vehicle multi-modal trajectory based on LSTM-GNN considering road information;

fig. 2 is a model structure diagram of a vehicle multi-modal trajectory prediction method considering road information based on LSTM-GNN.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a vehicle track multi-mode prediction method based on LSTM-GNN considering road information, wherein a model flow chart is shown in figure 1, and specifically comprises the following steps:

s1: vehicle track information is acquired. And extracting the information of the vehicles in the vehicle track data set according to the time sequence order from the public data set. The track of the target vehicle has a history track of 3s and a future track of 5s, and the track length of the target vehicle is at least 1000 meters. In addition, the surrounding traffic vehicles have a history of 3 s.

S2: the data preprocessing comprises the following specific steps:

s2.1: data cleaning; and filtering the acquired data information to remove abnormal data. And then cleaning the acquired vehicle track by using a smoothing filter to remove incomplete data and supplement missing data.

S2.2: dividing the data set; the cleaned data are sampled by utilizing a sliding window, 80 frames of vehicle track information are needed in each sample, the first 30 frames are historical track information, the last 50 frames are future track information, and then the data are proportionally divided into a training set, a verification set and a test set.

S2.3: map matching of tracks; if the track which does not accord with the normal form of the vehicle exists, the abnormal track point is further corrected.

S3: the encoder encodes the historical track information; the encoder consists of n+1 LSTM networks, and is input into a target vehicle history track and surrounding vehicle history tracks and output into an encoded vehicle dynamics feature vector

The specific implementation process is as follows:

s3.1: predicted vehicle v _p Is the historical track of (a)

Wherein->

For the predicted vehicle v _p At t ₁ Coordinates of time of day, use->

Representation, t ₁ E {1,2,3}. Surrounding traffic vehicle v _i Is the historical track of (a)

Wherein->

For surrounding traffic v _i At t ₁ Coordinates of time of day, use->

A representation; surrounding traffic vehicle v _i ∈{v ₁ ，...，v _n -where n is the predicted vehicle v _i Number of surrounding traffic vehicles. Predicted vehicle v _p Future predicted trajectory of +.>

Wherein->

Representation, t ₂ E {4,5,6,7,8}. Predicted vehicle v _p Is defined as +.>

Wherein->

And (3) representing.

S3.2: embedding a vehicle v using a multi-layer perceptron _j To obtain fixed length vector

Wherein v is _j ∈{v _p ，v ₁ ，…，v _n }。

s3.3: vehicle v _j Historical track information

And a fixed length vector->

Is input into an encoder LSTM to obtain a coding vector +.>

The encoding process is shown as follows:

wherein W is _encoder Is the weight of LSTM.

S3: interaction effects between vehicles are modeled based on GNNs. Modeling interactions using directed graphs, node representations with kinetic encoding features

Is a vehicle of (a).

S3.1: the graph structure can be represented by g= (V, E), with nodes defined as v= { V _p ，v ₁ ，...，v _i ，...，v _n Edge is defined as E.epsilon.V.times.V. Since the graph is a directed graph, node v _p And node v _i Edge and node v between _i And nodev _p The edges between are different, edge E can be expressed as:

representing node v _p To node v _i Directed edge of (v), node v _i Immediately adjacent node v _p And node v _i The behavior of (a) affects node v _p Is a behavior of (1).

S3.2: the interaction between vehicles is modeled using the graph neural network GNN, expressed by:

wherein IF represents a vehicle-to-vehicle interaction feature vector, GNN _inter Is an interactive encoder consisting of two layers of GNN networks,

for the dynamics feature vector of the vehicle, < > is>

Indicated at t ₁ Edges of the time graph structure.

S4: the road information feature vector LF is extracted based on CNN-LSTM. And coding the road structure information by using the 1D-CNN and LSTM models.

S4.1: the candidate lane is defined according to the current position of the target vehicle. First, a lane segment is searched within a search radius (10 meters) from the centroid of the target vehicle. The lane segments are then expanded forward and backward until the length of the lane lines reaches the desired length.

S4.2: determining the lane where the surrounding vehicles are located, and determining the lane L where the surrounding vehicles are located ⁱ Is sequentially input into the 1D-CNN and the LSTM for encoding,the following formula is shown:

is lane +.1D-CNN and LSTM encoded>

Information feature vector->

Representing a surrounding traffic vehicle v _i The lane in which the vehicle is located.

S43: the lane information feature vector LF is generated. Using attention weighting ω _i Coded information for lanes of surrounding traffic vehicles, information features

Fusion treatment was performed as shown in the following formula:

s5: and (5) feature fusion. First, the encoded vehicle dynamics feature vector

And the interactive feature vector IF between vehicles is spliced and then is fused with the road information feature vector LF. Then at->

Random noise z is added on the basis of the IF and LF fused feature vectors, where Gaussian-distributed mixed noise is used.

S6: and decoding and outputting the multi-mode future track by utilizing the LSTM network. The fused features with random noise added are input into a decoder of the LSTM network to generate a multi-modal future track.

S7: using diversity loss functions

Training a multimodal prediction model of a vehicle trajectory taking road structure information into consideration based on LSTM-GNN by +.>

A random noise is sampled arbitrarily in the distribution to generate k possible predicted trajectories, and an "optimal" predicted trajectory is selected based on the L2 Euclidean distance.

S7.1: the vehicle multi-mode prediction model based on LSTM-GNN considering road information is trained on the preprocessed training set, and is input into the history track information of the predicted vehicle and the surrounding vehicles and the road structure information contained in the high-precision map, and is output into the multi-mode future track of the vehicle.

S7.2: and calculating the accuracy of the vehicle track prediction model by using the verification set in the training process, and preventing the model from being fitted by combining the change of the loss function in the training process. Specifically, the weight parameters and the bias parameters of the model provided by the invention are trained on an i7-10700F/NVIDIA GeForce RTX 3070/PCle/SSE2 server by using a Pytorch deep learning framework, the processed data are transmitted into the model by using a DataLoader class in the Pytorch, an optimizer selects Adam, the learning rate is 0.001, training rounds (Epoch) are set to 200 rounds, and the model parameters are saved after each round is finished.

S7.3: the predicted future trajectory is compared with the real future trajectory and the value of the loss function is calculated, thereby updating the network parameters. After model training is completed, corresponding weight parameters and bias parameters are saved.

S8: and (5) experimental verification and analysis. And testing a vehicle track multi-mode prediction model considering road structure information based on LSTM-GNN and carrying out correlation analysis. The pre-processed test set is used to test the trained model, predict the likely trajectory of the target vehicle, and test the accuracy of the model predictions.

S8.1: and (5) evaluating indexes. Two general evaluation indexes are selected to evaluate the model, namely Average Displacement Error (ADE) and Final Displacement Error (FDE).

The average displacement error ADE represents the average L2 distance between the true value and the predicted value, and the calculation formula is as follows:

for the predicted vehicle v _p Predicted coordinates at time t, with +.>

A representation; />

For the predicted vehicle v _p The true coordinates at time t are determined by +.>

And (3) representing.

The final displacement error FDE represents the distance between the predicted final position and the final position of the true value, and the calculation formula is as follows:

s8.2: ablation study. The effect of each module in the prediction model provided by the invention is verified through an ablation experiment, and the effect of each comparison model is evaluated by using the average displacement error ADE and the final displacement error FDE. Four comparative models, model a, model B, model C, and model D, were constructed, as shown in table 1, with model a as the reference model.

Table 1 ablation study contrast model

Model	A	B	C	D
					Historical track encoder	√	√	√	√
Interactive feature extraction module		√	√	√
					Road feature extraction module			√	√
Diversity loss function				√

Model a predicts the future track by only the historical track, inputs the historical track to the encoder, and then encodes the dynamics feature vector

The input to the decoder decodes the future track of the vehicle. Model B is based on model A, adding interactive feature extraction module to extract dynamic feature vector +.>

And the integrated interaction feature vector IF is input into a decoder to decode the future track. The model C is based on the model A, and is added with an interaction feature module and a road feature extraction module to enable dynamics features +.>

The interaction feature vector IF and the road information feature vector LF are fused and then input to a decoder for decoding future tracks. The model D is a vehicle track multi-mode prediction method model based on LSTM-GNN considering road information, and as shown in FIG. 2, an interaction feature module and a road feature extraction module are added on the basis of the model A, and a multi-mode prediction track is generated by using a diversity loss function. Experimental results show that the model provided by the invention has an optimal effect, and the necessity of an interaction characteristic extraction module, a road information characteristic extraction module and multi-mode output is proved.

S8.3: and (5) comparing the reference models. Comparing the prediction model provided by the invention with common prediction models such as LSTM and DESIRE, and evaluating the model by using average displacement error ADE and final displacement error FDE, wherein the comparison result is shown in table 2, and the result shows that the prediction model provided by the invention is superior to the other two models.

Table 2 model comparison experiment results

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The vehicle track multi-mode prediction method taking road information into consideration based on LSTM-GNN is characterized by comprising the following steps of:

s2: preprocessing the data obtained in the step S1:

s3: the encoder encodes the history track information, and the encoder consists of n+1 LSTM networks, wherein the input is the history track of the target vehicle and the history track of surrounding vehicles, and the output is the encoded vehicle dynamics characteristic vector

S4: extracting a road information feature vector LF based on CNN-LSTM, and realizing the encoding of road structure information by using a 1D-CNN and LSTM model;

s5: encoding vehicle dynamics feature vectors

Adding random noise z on the basis of the IF and LF fusion feature vectors, wherein Gaussian distribution mixed noise is adopted;

s6: decoding and outputting a multi-modal future track by utilizing an LSTM network, and inputting the fusion characteristic added with the random noise z into a decoder to generate the multi-modal future track;

s7: a multi-modal prediction model of the vehicle track based on LSTM-GNN considering road structure information is trained using a diversity loss function, k possible prediction tracks are generated by arbitrarily sampling a random noise in the distribution, and an optimal prediction track is selected according to the L2 Euclidean distance.

2. The method for predicting the vehicle track multi-mode based on LSTM-GNN considering the road information according to claim 1, wherein the specific steps of the data preprocessing in S2 are as follows:

s2.1: data cleaning: filtering the acquired data information to remove abnormal data, and then cleaning the acquired vehicle track by using a smoothing filter to remove incomplete data and supplement missing data;

s2.2: dividing the data set: sampling the cleaned data by utilizing a sliding window, wherein 80 frames of vehicle track information are needed in each sample, the first 30 frames are historical track information, the second 50 frames are future track information, and then dividing the data into a training set, a verification set and a test set in proportion;

s2.3: map matching of trajectories: if the track which does not accord with the normal form of the vehicle exists, the abnormal track point is corrected.

3. The method for multi-modal prediction of vehicle trajectories based on LSTM-GNN consideration of road information according to claim 1, wherein in S3, an LSTM-based encoder is constructed, specifically comprising the steps of:

s3.1: predicted vehicle v _p Is the historical track of (a)

Wherein->

For the predicted vehicle v _p At t ₁ Coordinates of time of day, use->

Wherein->

For surrounding traffic v _i At t ₁ Coordinates of time of day, use->

Wherein->

Representation, t ₂ E {4,5,6,7,8}; predicted vehicle v _p Is defined as +.>

Wherein->

A representation;

Wherein v is _j ∈{v _p ，v ₁ ，...，v _n }；

s3.3: vehicle v _j Historical track information

And a fixed length vector->

Input into an encoder LSTM to obtain a motionMechanical feature vector->

The encoding process is shown as follows:

wherein W is _encoder Is the weight of LSTM.

4. The method for multi-modal prediction of vehicle trajectories based on LSTM-GNN consideration of road information according to claim 1, wherein the step of calculating the interaction feature vector IF between vehicles in S5 is as follows:

s5.1: the graph structure is denoted by g= (V, E), and the node is defined as v= { V _p ，v ₁ ，...，v _i ，...，v _n Edge definition E.epsilon.V.times.V; since the graph is a directed graph, node v _p And node v _i Edge and node v between _i And node v _p The edges between are different, edge E can be expressed as:

Representing node v _i To node v _p Is a directed edge of (2);

for the dynamics feature vector of the vehicle, < > is>

Indicated at t ₁ Edges of the time graph structure.

5. The method for multi-modal prediction of vehicle trajectories based on LSTM-GNN consideration of road information according to claim 1, wherein the step S4 is specifically as follows:

s4.1: defining a candidate lane according to the current position of a target vehicle, firstly searching a lane segment within 10 meters from the searching radius of the centroid of the target vehicle, and then expanding the lane segment forwards and backwards until the length of a lane line reaches the required length;

is lane +.1D-CNN and LSTM encoded>

Information feature vector->

Fusion treatment was performed as shown in the following formula:

6. the method for multi-modal prediction of vehicle trajectories based on LSTM-GNN consideration of road information according to claim 1, wherein the decoder in S6 is comprised of a long and short term memory network LSTM for decoding the fused feature vectors to generate the multi-modal predicted trajectories.