CN116912661A

CN116912661A - Target track prediction method and system with domain generalization capability

Info

Publication number: CN116912661A
Application number: CN202310892764.5A
Authority: CN
Inventors: 李煊鹏; 卢一凡; 薛启凡
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-20

Abstract

The invention discloses a target track prediction method and a target track prediction system with domain generalization capability, wherein target track information and vector map information of a scene where the target track information is located are firstly obtained; processing the map information to extract map information characteristics; extracting adjacent target tracks, inputting target history track data and the adjacent target tracks into a time sequence neural network encoder to extract characteristics; then, predicting the vehicle end point based on the invariance principle to obtain a predicted track end point; the predicted track end point codes, the interactive features, the map features and the historical track features are input into a time sequence neural network decoder in a cascading way to complement the rest future track points; and training a plurality of source domains together to obtain the average of the end point reconstruction loss, the track deviation loss and the kl divergence loss of each data domain, superposing the average with the invariance principle loss to form the final loss, and finally realizing target track prediction and improving the safety of the model and the accuracy of the model in an unknown domain.

Description

Target track prediction method and system with domain generalization capability

Technical Field

The invention belongs to the technical field of track prediction, and mainly relates to a target track prediction method and system with domain generalization capability.

Background

The deep learning model is an effective method for solving the problem of target track prediction. However, the deep learning model is extremely dependent on the distribution of training data, and cannot be adapted to the situation that the distribution difference between the source domain and the target domain is large. The distribution deviation of the data fields is unavoidable due to the different acquisition areas, acquisition modes and processing methods of the track data, so that the weak generalization capability of the model can bring about great potential safety hazard.

In the prior art, a plurality of target track prediction methods are provided, for example, chinese patent with application number CN202011493671.8 discloses a track prediction method based on BEV bird's eye view, the patent converts track and scene information into image data, and obtains a future track of a target to be predicted by encoding and decoding the image data; the Chinese patent with the application number of CN202010658245.9 discloses a track prediction method based on a graph network, global scene characteristics are extracted through a plurality of tracks and local point sets of the environment, and finally predicted tracks and corresponding probabilities are estimated according to the global scene characteristics; the Chinese patent with the application number of CN202210657220.6 proposes a method based on a double-head attention mechanism, and a graph attention network of nodes is constructed according to a target relation graph and a historical motion trail. However, in the method for solving the track prediction problem by adopting the deep learning model, the generalization problem caused by the data distribution difference is mostly ignored, the distribution of the training domain and the testing domain of the model is mostly similar, and the performance effect in the unknown domain is ignored.

In the extraction network design of scene context, the convolutional network has limited receptive field in a high-definition scene rendering mode, the method for modeling the map by the graph neural network focuses on the topological relation among lanes, and the directivity of the lanes and the time sequence existing when the track is constrained are not emphasized.

Disclosure of Invention

Aiming at the problems of poor prediction effect and poor generalization capability of a target track model under an unknown domain in the prior art, the invention provides a target track prediction method and a target track prediction system with domain generalization capability, which are characterized in that target track information and vector map information of a scene where the target track information is located are firstly obtained; processing the map information to extract map information characteristics; extracting adjacent target tracks, inputting target history track data and the adjacent target tracks into a time sequence neural network encoder to extract characteristics; then, predicting the vehicle end point based on the invariance principle to obtain a predicted track end point; the predicted track end point codes, the interactive features, the map features and the historical track features are input into a time sequence neural network decoder in a cascading way to complement the rest future track points; and training a plurality of source domains together to obtain the average of the end point reconstruction loss, the track deviation loss and the k l divergence loss of each data domain, superposing the average with the invariance principle loss to form the final loss, and finally realizing target track prediction and improving the safety of the model and the accuracy of the model in an unknown domain.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a target track prediction method with domain generalization capability comprises the following steps:

s1, acquiring training data: acquiring target track information and vector map information of a scene where the target track information is located;

s2, map information processing: inputting the map information obtained in the step S1 into a map information processing module to extract the central line information of the road, constructing a road polygon, searching adjacent roads, sorting according to the related distance of the map, and inputting the map information into a long-term and short-term memory network to extract the map information characteristics;

s3, time sequence network coding: extracting adjacent target tracks, inputting target history track data and the adjacent target tracks into a time sequence neural network encoder to extract characteristics;

s4, predicting the vehicle end point based on the invariance principle: cascading the map features, the historical track features and the track end point features output in the step S3 to an end point prediction module based on the invariance principle to obtain a predicted track end point; the step comprises the loss of invariance principle, the loss of KL divergence and the loss of end point reconstruction;

s5, filling the remaining track points: extracting interaction characteristics from the target track characteristics of the adjacent vehicles output by the time sequence neural network encoder through a social pooling module; inputting the track end point code predicted in the step S4, the interactive features, the map features and the historical track features into a time sequence neural network decoder in cascade to complement the rest future track points;

s6, parameter optimization: training a plurality of source domains together to obtain the average of the end point reconstruction loss, the track deviation loss and the k l divergence loss of each data domain, and superposing the average with the invariance principle loss to form a final loss;

s7, target track prediction: and finally, inputting the historical track data into a model, sampling the distribution of the end point prediction module to generate track points, and then complementing the rest track points to complete the future track prediction of the vehicle.

As an improvement of the present invention, in the step S1, the target track information includes at least a set of coordinate points of the position where the target is located, and the periphery of the target is adjacent to the set of coordinate points of the target; the vector map information at least comprises map coordinate points, lane information formed by the coordinate points and topological relations among lanes.

As an improvement of the present invention, the step S2 further includes:

s21, extracting a road center line: aiming at the situation that the directions of the lane lines of the vector map are inconsistent, whether the directions of the lane lines are consistent or not needs to be judged firstly, and the left lane line is reversed according to the situation; the lane line coordinates with fewer nodes are expanded by using a primary spline interpolation formula, so that the number of the nodes of the left lane line and the right lane line is consistent, and the interpolation formula is as follows:

f[k _r ,k _r+1 ]＝(j _r+1 -j _r )/(k _r+1 -k _r )

L _1,r (k)＝j _r +f[k _r ,k _r+1 ](k-k _r )

wherein the coordinates of the lane line are (k, j), r is the number of nodes of the lane line, L _1,r (k) Representing node (k) _r ,j _r ) And (k) _r+1 ,j _r+1 ) The interpolation formula of the primary spline between f [ k ] _r ,k _r+1 ]Is the slope between the two points.A primary spline interpolation formula for all points of the lane line;

s22, constructing a road polygon: the minimum polygon covering the road is obtained according to the left and right lane lines, the construction formula is as follows, (k, j) is the coordinates of the left and right lane lines,for the road polygon:

s23, searching adjacent roads: determining a coordinate range of a search road according to the endpoints of the current vehicle history track and the set search range superparameter; traversing road range polygons in accordance with search rangesCoordinate normalization is realized by taking a historical track endpoint as a reference on the road center line with the overlapped range;

s24, sequencing the central lines: sequencing the normalized road center line set in the step S23 according to the center line distance;

s25, inputting the processed center line set into an encoder composed of a long-short-term memory network LSTM to extract the characteristic V _m 。

As another improvement of the present invention, the center line distance d in the step S24 is the euclidean distance of the origin, specifically:

wherein, (a, b) is the road centerline coordinates.

As a further improvement of the present invention, in the step S4, the principle of invariance is lostThe formula is as follows,

wherein w is a linear classifier, phi is the endpoint prediction module, R ^e The deviation of the end point predicted value and the true value in the environment e is that n is the number of source domains:

the KL divergence lossThe method comprises the following steps:

wherein Z is a hidden variable in the constraint condition variation self-encoder, the condition distribution of Z meets the normal distribution, the mean value is mu, and the variance is sigma ² ；

The end point reconstruction lossThe l2 norm defined as the endpoint truth and predicted values is given by:

where m represents the number of endpoints, alpha represents the predicted future step size,in the environment e, the predicted value of the endpoint of the ith node in the alpha-th time frame,/>In the e environment, the endpoint true value of the ith node in the alpha time frame.

As a further improvement of the present invention, the track deviation loss formula of the remaining track points in the step S5 is

As a further improvement of the present invention, the final loss in step S6 is:

wherein L is _irm In order to minimize the loss term without changing the risk, n represents the number of data fields, and the coefficient gamma=delta=eta=1 to which the loss function belongs, and lambda is an adjustable parameter.

In order to achieve the above purpose, the invention also adopts the technical scheme that: a domain generalization capable target trajectory prediction system comprising a computer program which when executed by a processor performs the steps of any of the methods described above.

Compared with the prior art, the invention has the beneficial effects that:

1. the end point prediction module based on the invariance principle combines the invariance principle with the condition variation self-encoder, so that the situation that the distribution difference between a source domain and an unknown domain is large can be overcome, the generalization capability of the model is enhanced, the model is adapted to a more complex prediction scene, and the safety of a target track prediction model is improved;

2. the map information processing module is used for capturing the directional characteristics of the lanes, providing priori knowledge for track prediction, restraining the predicted future tracks from avoiding scene boundaries, and improving the accuracy of the model in an unknown domain.

Drawings

FIG. 1 is a flow chart of a method for predicting a target track with domain generalization capability according to the present invention;

FIG. 2 is a flowchart of a map information processing module in step S2 of the method of the present invention;

FIG. 3 is a flowchart of the social pooling module in step S5 of the method of the present invention.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

Example 1

The method can be used in the fields of automatic driving, auxiliary driving and the like. The present embodiment describes a case where the present method is applied to the field of automatic driving.

The present embodiment will be specifically described with reference to fig. 1 to 3, which is a target trajectory prediction method with domain generalization capability, and as shown in fig. 1, the method includes the following steps:

step S1: acquiring training data

Obtaining information such as a target track and a high-definition vector map of a scene where the target track is located, wherein the obtained data information comprises: the system comprises a coordinate point set of a position where a target is located, a coordinate point set of a periphery of the target adjacent to the target, a vector map of a scene where the target is located, wherein the map comprises coordinate points, lane information consisting of the coordinate points and topological relations among lanes.

In the training preparation phase, enough training data needs to be acquired, at least three different data fields are needed first, and each data field comprises a vehicle history track, a vehicle future track and a high-definition vector map of the current data field.

Step S2: map information processing

As shown in fig. 2, the high-definition vector map needs to be input into a map information processing module to be extractedRoad center line information, constructing a road polygon, searching adjacent roads, sorting according to the related distance of the map, and inputting the map information to a long-term and short-term memory network to extract the map information characteristic V _m 。

S21: extracting the center line of the road

The invention uses the road center line as map input information, reduces the model parameter, and the road center line is obtained by calculating the end points of left and right lane lines of the road. Aiming at the situation that the directions of the lane lines of the vector map are inconsistent, whether the directions of the lane lines are consistent or not needs to be judged first, and the left lane line is reversed according to the situation. Since the difference of the number of nodes on the left and right sides of the lane can cause the calculation error of the central line, a spline interpolation formula is used to expand the coordinates of the lane lines with fewer nodes, if the coordinates of the lane lines are (k, j):

f[k _r ,k _r+1 ]＝(j _r+1 -j _r )/(k _r+1 -k _r )

L _1,r (k)＝j _r +f[k _r ,k _r+1 ](k-k _r )

s22: constructing road polygons

Obtaining a minimum polygon covering the road according to the left lane line and the right lane line, and constructing a formula as follows, wherein (k, j) is coordinates of the left lane line and the right lane line:

s23: searching for adjacent roads

And determining the coordinate range of the searched road according to the end point of the current vehicle history track and the set search range super-parameters. Traversing road range polygons in accordance with search rangesSeating is realized by taking a center line of a road with overlapping scope as a reference of a historical track end pointAnd (5) normalization.

S24: ordering centerlines

Sequencing the normalized road center line set in the step S23 according to the center line distance, (a, b) is the coordinates of the road center line, the center line distance d defined by the invention is the Euclidean distance between the center line starting point and the end point and the historical track end point, namely the origin, and the formula is as follows:

s25: inputting the processed center line set into an encoder composed of a long-short-term memory network LSTM to extract the characteristic V _m 。

Step S3: and (3) encoding the time sequence network, extracting adjacent target tracks, and inputting target history track data and the adjacent target tracks into a time sequence neural network encoder to extract features.

S31: according to the historical track of the vehicle to be predicted, according to a 13X 13 space grid structure, the vehicle processing the current space grid position in the same time frame is searched out, and each unit of the space grid is 5 meters long and 2 meters wide as the adjacent vehicle of the vehicle to be predicted.

S32: the vehicle history track data extracted from the data set and the adjacent vehicle tracks are sequentially input to the LSTM encoder to extract features.

Step S4: and (3) carrying out vehicle end point prediction based on an invariance principle, inputting map features, historical track features and track end point features output by the time sequence neural network encoder into an end point prediction module based on the invariance principle in a cascading manner, obtaining a predicted track end point, and improving generalization of end point prediction by using the invariance principle.

S41: the constant risk minimization principle is used for acting on the conditional variation self-encoder structure, the constant risk minimization loss promotes the distribution learned by the module to enhance the correlation with the invariance characteristic, and the generalization capability of the module is improved. Loss of invariance principleThe formula is as follows, wherein w is a linear classifier, phi is the endpoint prediction module, R ^e The deviation of the end point predicted value and the true value in the environment e is that n is the number of source domains:

s42: the penalty of the end-point prediction module based on invariance principles also includes constraint variations from the hidden variable Z distribution in the encoder. Assuming that the conditional distribution of the fitting Z of the invention meets the normal distribution, the mean value is mu, and the variance is sigma ² KL divergence lossLet the distribution of hidden variable Z be closer to N (0, 1), the formula is as follows:

s43: the loss of the endpoint prediction module based on the invariance principle further comprises an endpoint reconstruction loss, wherein the endpoint reconstruction loss is defined as l2 norms of an endpoint true value and a predicted value, the formula is as follows, m represents the endpoint number, and alpha represents the predicted future step size:

step S5: and (3) complementing the remaining track points, and extracting interaction characteristics from the adjacent vehicle target track characteristics output by the time sequence neural network encoder through a social pooling module. And (3) inputting the predicted track end point codes, the interactive features, the map features and the historical track features into a time sequence neural network decoder in cascade to complement the rest future track points.

S51: the output of the LSTM of the adjacent vehicles is processed by a social pooling module to extract the interactive characteristic information to obtain the characteristic V _n . As shown in FIG. 3, the social pooling tensor is organized by 13X 13 spatial grid structure, the history of neighboring vehiclesTrack coding fills in to corresponding positions of the spatial grid. The social pooling tensor passes through a layer of 3×3 convolution kernel and a layer of 3×1 convolution kernel, and a layer of pooling layer finally outputs the adjacent vehicle interaction characteristics V _n . The social pooling module facilitates capturing interaction characteristics between neighboring vehicles and a predicted vehicle.

S52: s43, the predicted vehicle track end point value passes through the full connection layerGenerating a prediction end point code, and combining the prediction end point code with the adjacent vehicle interaction characteristic V _n Map information feature V _m Historical track feature V _t The cascade inputs into the LSTM decoder, ultimately outputting the remaining future track points. The track deviation loss formula of the track residual point is as follows

Step S6: to play the role of the constant risk minimization, a plurality of source domains are trained together to calculate a constant risk minimization loss item L _irm . The overall loss function of the model is as follows, n represents the number of data fields, the coefficient gamma=delta=eta=1 of the loss function, lambda is an adjustable parameter, and the action strength of the risk-free minimum loss term is determined:

step S7: and finally, inputting the historical track data into a model, sampling the distribution of the end point prediction module to generate track points, and then complementing the rest track points to complete the future track prediction of the vehicle.

In order to verify the correctness and rationality of the implementation mode, two road conditions of the rotary island and the intersection in the public data set INTERACTON are adopted for testing, each map scene is defined as a data field, one map scene is selected as a test set, the rest map scenes are used as training sets, and the selection of the test fields is determined according to the track data quantity. The test set data is not visible to the model during the training process. The present embodiment divides the track into 5s segments, including a history track of 2s and a future track of 3 s. The original sampling frequency of the vehicle track is 10Hz, the track is smooth and real, the model parameter is reduced, the track is downsampled, the downsampling coefficient is 2, namely the step length of the historical track is 10, and the step length of the future track is 15.

The test evaluation index measures the generalization capability of the vehicle track model by using the minimum average displacement error. The mADE is defined as the average L2 norm of the predicted track and the true track point with the minimum interpolation of the generated k end points and the true values, and the formula is as follows:

the test results are shown in tables 1 and 2:

TABLE 1 results of roundabout scene test

Time	ERM	MMD	DANN	IRM	Rex	The invention is that
							1s	0.32	0.15	0.40	0.22	0.18	0.18
2s	0.73	0.55	0.84	0.58	0.67	0.41
							3s	1.51	1.31	1.71	1.30	1.46	0.72

Table 2 intersection scene test results

Time	ERM	MMD	DANN	IRM	Rex	The invention is that
							1s	0.19	0.13	0.24	0.24	0.12	0.12
2s	0.39	0.32	0.56	0.45	0.32	0.21
							3s	0.78	0.70	1.16	0.85	0.73	0.36

Introduction of each comparative method is shown in table 3:

table 3 introduction to the comparative method

According to experimental results, the method can play a role in improving generalization capability, and provides a new thought for improving model generalization and solving the problem of generalization of vehicle track prediction in deep learning.

It should be noted that the foregoing merely illustrates the technical idea of the present invention and is not intended to limit the scope of the present invention, and that a person skilled in the art may make several improvements and modifications without departing from the principles of the present invention, which fall within the scope of the claims of the present invention.

Claims

1. The target track prediction method with domain generalization capability is characterized by comprising the following steps of:

s6, parameter optimization: training a plurality of source domains together to obtain the end point reconstruction loss, the track deviation loss and the kl divergence loss of each data domain, and superposing the end point reconstruction loss, the track deviation loss and the kl divergence loss with the invariance principle loss to form a final loss;

2. The method for predicting a target trajectory with domain generalization capability as claimed in claim 1, wherein: in the step S1, the target track information at least includes a set of coordinate points of the position where the target is located, and the periphery of the target is adjacent to the set of coordinate points of the target; the vector map information at least comprises map coordinate points, lane information formed by the coordinate points and topological relations among lanes.

3. The method for predicting a target trajectory with domain generalization capability as claimed in claim 2, wherein: the step S2 further includes:

s21, extracting a road center line: aiming at the situation that the directions of the lane lines of the vector map are inconsistent, whether the directions of the lane lines are consistent or not needs to be judged firstly, and the left lane line is reversed according to the situation; the lane line nodes with fewer nodes are expanded by using a primary spline interpolation formula, so that the number of the nodes of the left lane line and the right lane line is consistent, and the interpolation formula is as follows:

f[k _r ，k _r+1 ]＝(j _r+1 -j _r )/(k _r+1 -k _r )

L _1，r (k)＝j _r +f[k _r ，k _r+1 ](k-k _r )

wherein the coordinates of the lane line are%k, j), r is the number of nodes of the lane line, L _1，r (k) Representing node (k) _r ，j _r ) And (k) _r+1 ，j _r+1 ) The interpolation formula of the primary spline between f [ k ] _r ，k _r+1 ]For the slope between the two points,a primary spline interpolation formula for all points of the lane line;

s25, inputting the processed center line set into an encoder composed of a long-short-term memory network LSTM to extract the map information feature V _m 。

4. A method for predicting a target trajectory with domain generalization capability as claimed in claim 3, wherein: the center line distance d in the step S24 is the euclidean distance of the origin, specifically:

wherein, (a, b) is the road centerline coordinates.

5. The method for predicting a target trajectory with domain generalization capability as claimed in claim 1, wherein: in the step S4, the principle of invariance is lostThe formula is as follows,

the KL divergence lossThe method comprises the following steps:

The end point reconstruction lossThe I2 norm, defined as the endpoint truth and predicted value, is given by:

6. The method for predicting a target trajectory with domain generalization capability as claimed in claim 5, wherein: the track deviation loss formula of the residual track points in the step S5 is as follows

7. The method for predicting a target trajectory with domain generalization capability as claimed in claim 6, wherein: the final loss in the step S6 is as follows:

wherein L is _im In order to minimize the loss term without changing the risk, n represents the number of data fields, and the coefficient gamma=delta=eta=1 to which the loss function belongs, and lambda is an adjustable parameter.

8. A domain generalization capable target trajectory prediction system comprising a computer program characterized by: the computer program, when executed by a processor, implements the steps of the method as described in any of the above.