CN115042798A

CN115042798A - Traffic participant future trajectory prediction method and system, and storage medium

Info

Publication number: CN115042798A
Application number: CN202110250462.9A
Authority: CN
Inventors: 周卫林; 王玉龙; 闫春香
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2022-09-13

Abstract

The invention relates to a method and a system for predicting a future track of a traffic participant and a storage medium, wherein the method comprises the following steps: inputting the position coordinates at the current moment into a pre-trained GAT model for processing and outputting the feature vectors of a plurality of nodes; obtaining track point feature vectors of a plurality of traffic participants at the current moment according to the current moment position coordinates of the plurality of traffic participants and the feature vectors of the nodes corresponding to the plurality of traffic participants; obtaining track point feature vectors of a plurality of traffic participants at historical time, and obtaining track sequence data used for representing the ending of the plurality of traffic participants to the current time according to the track point feature vectors at the current time and the track point feature vectors at the historical time; the track sequence data comprises a plurality of track point feature vectors; and inputting the track sequence data into a pre-trained transform model to predict and output the position coordinates of the track points of a plurality of traffic participants in a future preset time period.

Description

Traffic participant future trajectory prediction method and system, and storage medium

Technical Field

The invention relates to the technical field of intelligent traffic, in particular to a method and a system for predicting future trajectories of traffic participants and a computer-readable storage medium.

Background

The traffic participant future track prediction is to predict the future travel track of a traffic participant such as a pedestrian, a vehicle and the like according to the current or historical track of the traffic participant and environmental information, so that an automatic driving vehicle makes a decision in advance according to the prediction result, and the track is properly adjusted to avoid collision with the traffic participant when the automatic driving vehicle travels along a planned path so as to safely travel to a destination.

The current traffic participant future track prediction mode does not consider the interaction existing in the motion states of the traffic participants in the real scene, namely social interaction, and only takes each traffic participant as an independent individual to predict the future track of each traffic participant according to the position coordinates of the historical track point of each independent individual.

Disclosure of Invention

The invention aims to provide a method and a system for predicting a future trajectory of a traffic participant and a computer-readable storage medium, which at least realize that the interaction existing in motion states among the traffic participants in a real scene is considered when the future trajectory of the traffic participant is predicted, and improve the reliability of the trajectory prediction.

In order to achieve the above object, a first aspect of the present invention provides a method for predicting a future trajectory of a traffic participant, including:

acquiring current time position coordinates of a plurality of traffic participants;

inputting the current time position coordinates into a pre-trained GAT model for processing and outputting feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the characteristic vector of the node represents social interaction information between the traffic participant corresponding to the node and the traffic participants corresponding to other nodes;

obtaining track point feature vectors of the plurality of traffic participants at the current moment according to the position coordinates of the plurality of traffic participants at the current moment and the feature vectors of the nodes corresponding to the plurality of traffic participants;

obtaining historical track point feature vectors of the plurality of traffic participants in a past preset time period, and obtaining track sequence data used for representing the ending of the plurality of traffic participants to the current time according to the current track point feature vectors and the historical track point feature vectors; the track sequence data comprises a plurality of track point feature vectors;

and inputting the track sequence data into a pre-trained Transformer model for prediction and outputting the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

Optionally, the obtaining the position coordinates of the plurality of transportation participants at the current time includes:

the method comprises the steps of obtaining a current-time image of the surrounding environment of the vehicle, carrying out image recognition on the image of the surrounding environment of the vehicle to obtain position coordinates of a plurality of traffic participants in the surrounding environment of the vehicle in an image coordinate system, and obtaining the position coordinates of the plurality of traffic participants in a world coordinate system according to the position coordinates of the plurality of traffic participants in the image coordinate system.

Optionally, the inputting the current-time position coordinate into a pre-trained GAT model to process and output feature vectors of a plurality of nodes includes:

describing spatial relationship information between nodes by adopting a double random adjacency matrix; the spatial relationship information comprises characteristic quantity of a connecting edge between any two nodes;

acquiring an initial characteristic vector of the node according to the spatial relationship information and the position coordinates of the traffic participants corresponding to the node;

and calculating attention coefficients according to the initial feature vectors of the nodes and an attention mechanism, and obtaining feature vectors of all the nodes according to the feature quantity of the connecting edges between the nodes and the attention coefficients.

Optionally, the dual random adjacency matrix is expressed by the following expression:

wherein, the first and the second end of the pipe are connected with each other,

representing the characteristic quantity of the connecting edge between the node i and the node j at the time t,

and N is the total number of nodes.

Optionally, calculating an attention coefficient according to the initial feature vector of the node and an attention mechanism, and obtaining feature vectors of all nodes according to a connection edge feature quantity between the nodes and the attention coefficient, includes:

and calculating a plurality of groups of attention coefficients by adopting a multi-head attention mechanism, obtaining a plurality of groups of node feature vectors according to the characteristic quantity of the connecting edges between the nodes and the plurality of groups of attention coefficients, and obtaining the characteristic vectors of all the nodes according to the plurality of groups of node feature vectors.

Optionally, the Transformer model comprises an encoder and a decoder;

the encoder is used for carrying out position encoding on the track sequence data to obtain track sequence encoded data, and inputting the track sequence encoded data into a plurality of transform-blocks which are sequentially superposed for carrying out information encoding to output encoded data to be processed; the track sequence coded data comprise a plurality of track point feature vectors subjected to position coding; each track point feature vector subjected to position coding is position coding plus a track point feature vector at a corresponding position in the track sequence data, the track point feature vectors at even number positions of the track sequence are subjected to sine coding, and the track point feature vectors at odd number positions of the track sequence are subjected to cosine coding;

the decoder is used for decoding the coded data to be processed and predicting and outputting the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

The second aspect of the present invention provides a system for predicting a future trajectory of a traffic participant, including:

the coordinate acquisition unit is used for acquiring the current time position coordinates of a plurality of traffic participants;

the social interaction unit is used for inputting the position coordinates at the current moment into a pre-trained GAT model for processing and outputting the feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the characteristic vector of the node represents social interaction information between the traffic participant corresponding to the node and the traffic participants corresponding to other nodes;

the first track point feature processing unit is used for obtaining track point feature vectors of the plurality of traffic participants at the current moment according to the current moment position coordinates of the plurality of traffic participants and the feature vectors of the nodes corresponding to the plurality of traffic participants;

the second track point feature processing unit is used for acquiring track point feature vectors of historical moments of the plurality of traffic participants in a past preset time period, and acquiring track sequence data used for representing the ending of the plurality of traffic participants to the current moment according to the current track point feature vectors and the track point feature vectors of the historical moments; the track sequence data comprises a plurality of track point feature vectors; and

and the future track prediction unit is used for inputting the track sequence data into a pre-trained transform model to predict and output the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

Optionally, the social interaction unit comprises:

the spatial relationship determining unit is used for describing spatial relationship information between the nodes by adopting a double random adjacency matrix; the spatial relationship information comprises characteristic quantity of a connecting edge between any two nodes;

the first node feature processing unit is used for acquiring an initial feature vector of the node according to the spatial relationship information and the position coordinates of the traffic participants corresponding to the node; and

and the second node feature processing unit is used for calculating attention coefficients according to the initial feature vectors of the nodes and an attention mechanism, and obtaining feature vectors of all the nodes according to the characteristic quantity of the connecting edges between the nodes and the attention coefficients.

Optionally, the Transformer model comprises an encoder and a decoder;

the encoder is used for carrying out position encoding on the track sequence data to obtain track sequence encoded data, inputting the track sequence encoded data into a plurality of transform-blocks which are sequentially superposed for carrying out information encoding, and outputting encoded data to be processed; the track sequence coded data comprise a plurality of track point feature vectors subjected to position coding; each track point feature vector subjected to position coding is position coding plus a track point feature vector at a corresponding position in the track sequence data, the track point feature vectors at even number positions of the track sequence are subjected to sine coding, and the track point feature vectors at odd number positions of the track sequence are subjected to cosine coding;

A third aspect of the invention proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for predicting a future trajectory of a traffic participant according to the initial aspect.

The embodiment of the invention provides a traffic participant future trajectory prediction method, a system and a computer readable storage medium, wherein when the traffic participant future trajectory prediction is carried out, at each sampling moment, a pre-trained GAT model is utilized to process the position coordinates of the traffic participant at the current moment and output feature vectors of a plurality of nodes; the feature vector of the node represents social interaction information between the traffic participant corresponding to the node and the traffic participants corresponding to other nodes; based on the method, social interaction information of the traffics at a plurality of historical sampling moments can be obtained, a track point characteristic sequence containing track point positions and social interaction information can be generated as a historical track by combining position coordinates of the traffics at the plurality of historical sampling moments, the future track of the traffic participants can be obtained by predicting the historical track by using a pre-trained Transformer model, and compared with the prior art, the method and the device have the advantages that the social relation of the traffic participants in a scene is constructed by using a neural network GAT, the influence of surrounding individuals on the traffic participants is obtained, and the reasonability and the precision of track prediction are improved; the method introduces a Transformer model in the field of natural language processing to extract features, can automatically concentrate on the characteristics of the change of motion modes of front and back track points by directly utilizing a self-attention mechanism, captures the long-distance dependency relationship in a sequence, and improves the parallelism of a training process by the model in a mode of one-time traversal of time sequence data, so that the training time is greatly reduced.

Additional features and advantages of the invention will be set forth in the description which follows.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for predicting a future trajectory of a traffic participant according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a single transform-block in the embodiment of the present invention.

Fig. 3 is a technical schematic diagram of a method for predicting a future trajectory of a traffic participant in an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a system for predicting a future trajectory of a traffic participant according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.

The embodiment of the invention provides a traffic participant future track prediction method, wherein a traffic track prediction problem can be expressed as historical observation tracks of N traffic participants in a given traffic scene, the future motion track states of the N traffic participants are estimated, and the past states of all the traffic participants are assumed to be

I.e., the historical observation trajectory up to the current time, wherein,

as the coordinate of the traffic participant i at time T, T _obs Recording the time length of the historical observation track and the state of the future motion track as

I.e. the true future trajectory, T _pred For the time span of future track, the process of establishing traffic track prediction model is to find a condition

Of (2) a mapping relation

Of the output future predicted trajectory

As much as possible to coincide with the true future trajectory.

Referring to fig. 1, the method of the present embodiment includes the following steps S1 to S3:

step S1, obtaining the current time position coordinates of a plurality of traffic participants;

in one example, an image of the surrounding environment of the vehicle at the current time may be captured by a vehicle-mounted camera configured by the intelligent driving system, then the image of the surrounding environment of the vehicle is subjected to image recognition to obtain position coordinates of a plurality of traffic participants in the surrounding environment of the vehicle in an image coordinate system, for example, a pre-trained convolutional neural network is used to perform image convolution processing to recognize the traffic participants in the image, and obtain position coordinates of the traffic participants in the image coordinate system, where the traffic participants may be vehicles, pedestrians, and finally the position coordinates of the plurality of traffic participants in a world coordinate system, that is, spatial position coordinates, are obtained according to the coordinates of the plurality of traffic participants in the image coordinate system; it is understood that the coordinate transformation between the image coordinate system and the world coordinate system is well known to those skilled in the art and will not be described herein;

step S2, inputting the current time position coordinates into a pre-trained GAT model for processing and outputting feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the characteristic vector of the node represents social interaction information between the traffic participant corresponding to the node and the traffic participants corresponding to other nodes;

specifically, in the embodiment, considering that the motion behavior association degree between different traffic participants is mainly determined by the relative distance in the space, a GNN and a related attention mechanism are introduced into the social interaction module, and the motion state interaction relationship between the participants is learned, so that corresponding feature vectors are generated for absolute track points of all traffic participants in a certain range which can be detected by the vehicle at each observation sampling moment to serve as social interaction information;

wherein, the English of GAT is called Graph attention network, and the Chinese name is Graph attention network;

step S3, obtaining track point feature vectors of the plurality of traffic participants at the current moment according to the position coordinates of the plurality of traffic participants at the current moment and the feature vectors of the nodes corresponding to the plurality of traffic participants;

specifically, according to step S3, at each sampling time, the method of this embodiment combines the current position coordinates of the track point of the traffic participant and the social interaction information of the traffic participant with other traffic participants to obtain a track point feature vector including the spatial state of the traffic participant and the spatial influence of other traffic participants, which can be understood as a semantic representation feature;

step S4, obtaining track point feature vectors of historical moments of the plurality of traffic participants in a past preset time period, and obtaining track sequence data used for representing the ending of the plurality of traffic participants to the current moments according to the current moment track point feature vectors and the historical moment track point feature vectors; the track sequence data comprises a plurality of track point feature vectors;

specifically, the track point feature vector at the historical time may be obtained according to the process of step S3 at the historical time and stored in a storage unit of the vehicle, and track sequence data for indicating a preset recording duration from the end of a plurality of traffic participants to the current time may be obtained according to the track point feature vector at the current time and the track point feature vector at the historical time; sequencing a plurality of track point feature vectors in the track sequence data according to the time stamp of the adopted time;

step S5, inputting the track sequence data into a pre-trained transform model for predicting and outputting the track point position coordinates of the plurality of traffic participants in a future preset time period;

specifically, an encoder of the transform model captures long-distance dependency relationships while performing parallel computation through a multi-head self-attention mechanism, encodes input track sequence data into a set of implicit state representations in a continuous space, and a decoder of the transform model finally generates a plurality of predicted track points of a time span of a future track step by step.

Optionally, the step S2 includes:

step S21, describing the spatial relationship information between the nodes by adopting a double random adjacency matrix; the spatial relationship information comprises characteristic quantity of a connecting edge between any two nodes;

in particular, an undirected graph can be constructed with participants at the same timestamp t as nodes

Wherein the node

Representing the ith traffic participant, connecting edges

Represents the interaction between nodes i and j; it should be noted that, in the conventional GAT, the feature vector obtained by directly embedding the position coordinates of each traffic participant is generally used as an input, but the geometric position relationship between nodes is ignored, and in order to capture neighborhood information and reflect the spatial geometric relationship between nodes in the adaptive learning process, the embodiment of the present invention integrates spatial information into a graph-based map

Constructing a double random Adjacent Matrix (DSM) A ^t ∈R ^N×N Obtaining a feature vector of each node as an adjacency matrix;

specifically, the original spatial relationship between objects is captured by calculating the geometric distances between the traffic participants and adding a self-join for each object, as shown in the following expression:

based on the results, a dual random adjacency matrix is designed according to the following equation:

in the formulas (1) and (2),

a distance value (e.g., a euclidean distance value) representing the distance between the node i and the node j at the time t, where N is the total number of nodes;

it should be noted that the dual random adjacency matrix has the characteristics of symmetrical semipositive definite and maximum eigenvalue of 1, and the like, and can effectively avoid the phenomena that the adjacency matrix is subjected to numerical value explosion or contraction to zero and the like in the information transmission process, so that the stabilization process of the graph of the GAT model is facilitated, and then the attention operation in the GAT model can be guided by using the edge characteristics of the dual random adjacency matrix;

step S22, obtaining an initial feature vector of the node according to the spatial relationship information and the position coordinates of the traffic participants corresponding to the node;

specifically, the node feature vector is updated by the following embedding formula and aggregation function:

in the formula (3), phi _s (. -) represents an Embedding function, which is a formulation of a linear transformation with Embedding (Embedding) and a weight parameter of W _s I.e. by

Feature mapping weight W _s Is obtained by continuous optimization of the training process,

for the position coordinates of the traffic participant at the time t corresponding to the node i, W _s The weights are mapped for the features and,

and is provided with

Wherein F ₁ Represents the characteristic dimension of the embedding, σ (-) is the LeakyReLU activation function, b _h Representing a bias term;

step S23, calculating attention coefficients according to the initial characteristic vectors of the nodes and an attention mechanism, and obtaining characteristic vectors of all the nodes according to characteristic quantities of connecting edges among the nodes and the attention coefficients;

specifically, the updated node feature vector can be obtained through the steps

Capturing the interactive relation of each traffic participant in a spatial domain by taking the interactive relation as the input of a GAT model, wherein the attention mechanism in the GAT model is used for distributing different weights, namely attention coefficients, to adjacent nodes according to the characteristic difference of the adjacent nodes;

the attention mechanism in the embodiment of the invention is shown in the following formula (4):

in the formula (4), the first and second groups,

indicating the attention coefficient calculated from the attention mechanism,

represents the set of all neighbor nodes of node i, W _h Representing a linear transformation weight parameter, W, applied at each node ₁ And W ₂ Representing a weight matrix multiplied by the transformed eigenvector, with the effect of multiplying F ₁ Conversion of dimensional data into 1-dimensional, W ₁ For multiplication by the eigenvector of the current node i, W ₂ For multiplication with the feature vectors of other nodes;

specifically, after the calculation of the attention coefficient is completed, the node v may be obtained according to the idea of weighted summation of the attention coefficient _i The new feature vector of (2), namely weighted summation, is to multiply the connecting edges of the node i and other nodes by weights and then sum.

Optionally, the step S23 specifically includes:

calculating a plurality of groups of attention coefficients by adopting a multi-head attention mechanism, obtaining a plurality of groups of node feature vectors according to the characteristic quantity of the connecting edges between the nodes and the plurality of groups of attention coefficients, and obtaining the characteristic vectors of all the nodes according to the plurality of groups of node feature vectors;

specifically, in order to further enhance the expression capability of the attention layer, a multi-head attention mechanism is added in step S23, that is, after a plurality of independent groups of attention mechanisms are called, the output results are spliced together:

in equation (5), | | represents the splicing operation,

the learning parameters corresponding to the kth set of attention mechanisms are represented,

i.e. W as described above _h Are the same parameters, where increasing superscript k denotes the kth group,

is a weight coefficient calculated by the kth group attention mechanism, and sigma (-) is a LeakyReLU activation function; finally, the feature vectors of all the traffic participants can be obtained

Wherein the feature vector of the object i at each time instant

All implies the social interaction of the neighbor objects to it; it should be noted that during the training process, the multi-head attention mechanism refers to the method of continuously adjusting the weight coefficient W ₁ And W ₂ Finally, different attention coefficients reflecting the importance of the node j to the node i are obtained.

For any predicted traffic participant, social interaction information at a corresponding moment can be obtained after social interaction operation is performed on the state of each timestamp of an observation track sequence, sequence data generated after fusion with historical track sequence data containing the social interaction information and track point position coordinates contains semantic features of overall time and space influences till the current moment, and a predicted track can be indirectly generated in multiple steps according to the fused sequence data. Generally speaking, an encoder-decoder structure is adopted, that is, potential motion behavior characteristics are learned through the encoder, the decoder generates a future predicted track in a scene according to context information, the prediction track generally comprises a traditional recurrent neural network (RNN, LSTM or variants thereof) and the like, but the problems of low speed, insufficient information focus and the like exist in a sequential input mode of the RNN, LSTM or variants thereof.

Specifically, the Transformer model in the embodiment of the present invention includes an encoder and a decoder;

the encoder of the embodiment of the invention is used for carrying out position coding on the track sequence data to obtain track sequence coded data, and inputting the track sequence coded data into a plurality of transducer-blocks which are sequentially superposed for carrying out information coding to output coded data to be processed; the track sequence coding data comprise a plurality of track point feature vectors subjected to position coding; each track point feature vector subjected to position coding is position coding plus a track point feature vector at a corresponding position in the track sequence data, the track point feature vectors at even number positions of the track sequence are subjected to sine coding, and the track point feature vectors at odd number positions of the track sequence are subjected to cosine coding;

specifically, the operation of step S3 is as shown in the following formula (6):

in the formula (6), the first and second groups,

position coordinates phi of the traffic participant corresponding to the node i at the t-th timestamp _e (. cndot.) is a function of embedding,

for F obtained by an embedding operation ₂ A dimensional feature vector;

the encoder comprises 6 layers of superimposed transform-blocks, the structure of each transform-block is shown in figure 2 and comprises four parts of a multi-head attention mechanism, residual error connection, layer normalization and a full-connection network, and the input of the encoder is a feature vector

And position coding p of the same dimension ^t The sum, as shown in the following equation (7):

wherein the content of the first and second substances,

namely a fusion track of the traffic participant i containing the sequence information

The vector at time t; destination area of position codingThe front-back position relation of the track points in the subsequence, in the embodiment of the invention, the calculation of the form of using sine coding at even positions and cosine coding at odd positions is shown in the following formula (8):

where t denotes the position of the current timestamp in the sequence, d denotes the position index of each value in the vector, F ₂ Representing a dimension of a vector;

in the transform model of the embodiment of the present invention, a multi-head self-attention mechanism of a dot product attention calculation mode is adopted, and a calculation formula is shown in the following formula (9):

MultiHead(Q,K,V)＝Concat(head ₁ ,…,head _h )W ^O (11)

in equations (9) - (12), Q, K, V represents the query matrix, the key matrix, and the value matrix, respectively, Q, K, V is equal to the fused track vector matrix obtained above, K ^T Is a transposed matrix representing the query matrix K, d _modle 、d _k Respectively the dimensions of the input space and the dimensions of the mapping space,

and

the matrices for linear transformation of Q, K, V are respectively shown, since a multi-head attention mechanism is adopted

And

i in (1) refers to the ith head, F ₂ Word vector mapping of dimension to d _k Dimension, h represents the number of heads, each head can capture subspace information in a track sequence, h times of self-attention mechanism is executed and then splicing is carried out, and a linear transformation matrix W is used for carrying out ^O Obtaining a final multi-head self-attention value; the essence of the multi-head attention mechanism is that the same Q, K, V is mapped into different subspaces of the original high-dimensional space for attention calculation, and the attention information in the different subspaces is merged in the last step, so that the dimension of each vector is reduced when the attention of each head is calculated, and overfitting is prevented in a certain sense.

The layer normalization calculation formula in the Transformer model is as follows:

in the formula (13), ξ _n The nth dimension of the fusion trajectory vector xi is represented, m represents the mean value of the input xi, and H refers to the dimension F ₂ Where σ represents the standard deviation of the input xi, α and β are learnable parameters, and e is for protection againstA decimal where the stop divisor is 0; the layer normalization aims at enhancing the mobility of back propagation information, accelerating the training convergence speed and solving the problems of difficult network training and difficult fitting.

The forward connection calculation in the Transformer model in the embodiment of the invention is shown in the following formula (14):

FFN(y)＝max(0,yW ₁ +e ₁ )W ₂ +e ₂ (14)

in the formula (14), y is an output vector normalized by an LN () layer, e is a bias term, two full connections are arranged behind a multi-head self-attention module, and a Relu activation function is added in the middle to form a double-layer feedforward neural network; wherein the output of the encoder is the output of the last transform-block. The Transformer-blocks are stacked in series so that the output of one Transformer-block serves as the input to the other Transformer-block.

The decoder is used for decoding the coded data to be processed, predicting and outputting the track point position coordinates of the plurality of traffic participants in a future preset time period;

the decoder structure of the Transformer in the embodiment of the present invention is mostly the same as the encoder, and has 6 layers, but each individual decoder has one more encoder-decoder attention layer compared to the encoder, between the self-attention layer (referred to as the mask self-attention layer in the decoder layer) and the fully-connected network layer, and the encoder-decoder attention layer is the same as the self-attention layer in the encoder, and they all use multi-head attention calculation, however, the encoder-decoder attention layer uses the conventional attention mechanism, where Q is the encoded value at the last time i that has been calculated by the conventional attention mechanism, and K and V are both the output of the encoder, which is different from the self-attention layer in the encoder. The reason why the decoder's self-attention layer uses masks is because when predicting the position of a trace point at a certain time, the current time cannot obtain information of the future time, and unlike the self-attention layer in the encoder, Q, K, V is derived from a different source, and Q is derived from the output of the previous decoder (for the first prediction time step, Q is a fused frameTrajectory vector

) And K and V are from the output of the encoder. The encoder of the Transformer model traverses the whole track sequence, so that the global semantic features representing the motion mode of the predicted object can be extracted, the global semantic features are transmitted to the decoder end to realize indirect multi-step prediction of the future track through an autoregressive mode, and the output of each step is input to the decoder at the next time step. Through T _pred The future predicted track of the traffic participant i can be obtained through secondary autoregressive prediction

The technical principle of the method according to the embodiment of the present invention is shown in fig. 3, and the content shown in fig. 3 can be understood as the structure and principle of a traffic trajectory prediction model corresponding to the method according to the embodiment of the present invention, where the traffic trajectory prediction model includes the GAT model and the transform model. Specifically, during training, after a future trajectory is generated through a trajectory prediction network according to a historical observation trajectory, in order to enable a traffic trajectory prediction model to continuously adjust the weight of parameters in the training process so that the predicted future trajectory and a real future trajectory are as close as possible, the Euclidean distance between the two trajectories is used as a loss function to prevent the loss of the generalization capability of the model caused by an overfitting phenomenon, constraints of a regularization term on the weight parameters are added, and a target function of network training obtained by combination is shown in the following formula (17):

in equation (17), N is the number of total training samples, Y and

respectively representing real tracks and generated predicted tracks in a training set, lambda is a regular coefficient, and a training traffic track prediction model aims at minimizing an objective function to obtain an optimal weight parameter W, whereinW refers to all trainable parameters in the model.

According to the description of the content of the embodiment, in the method for predicting the track in the complex dynamic traffic flow, a behavior relation graph among the traffic participants is constructed on the basis of a graph neural network in a model for describing the change rule of the time-space state of the model, the space interaction characteristics for representing the interaction among the individuals are captured through an information transfer mechanism among graph nodes, the interaction degrees between different adjacent individuals and a prediction main body are obtained in a differentiated mode according to the relevance of the motion behaviors by introducing an attention mechanism, and the reasonability of the track prediction in the complex dynamic traffic flow environment is met; and moreover, a transform coding-decoding structure is introduced, a transform coder relies on the internal techniques of an auto-attention mechanism, a multi-head attention mechanism and the like, a motion mode of a target under interaction is fully mined under a highly parallel computing mode, and a decoder predicts track points at the next moment and updates interaction characteristics by utilizing the auto-regression mode in combination with a historical motion mode and previous spatial position information to complete the future track prediction of fixed points.

Referring to fig. 4, another embodiment of the present invention provides a system for predicting a future trajectory of a traffic participant, including:

a coordinate acquiring unit 1, configured to acquire current time position coordinates of a plurality of traffic participants;

the social interaction unit 2 is used for inputting the current time position coordinates into a pre-trained GAT model for processing and outputting feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the feature vector of the node represents social interaction information between the traffic participant corresponding to the node and traffic participants corresponding to other nodes;

the first track point feature processing unit 3 is configured to obtain track point feature vectors of the multiple traffic participants at the current time according to the current-time position coordinates of the multiple traffic participants and the feature vectors of the nodes corresponding to the multiple traffic participants;

the second track point feature processing unit 4 is configured to acquire a historical track point feature vector of the plurality of traffic participants in a past preset time period, and obtain track sequence data used for representing that the plurality of traffic participants are cut off to the current time according to the current track point feature vector and the historical track point feature vector; the track sequence data comprises a plurality of track point feature vectors; and

and the future track prediction unit 5 is used for inputting the track sequence data into a pre-trained transform model for prediction and outputting the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

Optionally, the social interaction unit 2 comprises:

Optionally, the Transformer model comprises an encoder and a decoder;

The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

It should be noted that the system described in the foregoing embodiment corresponds to the method described in the foregoing embodiment, and therefore, a part of the system described in the foregoing embodiment that is not described in detail can be obtained by referring to the content of the method described in the foregoing embodiment, that is, the specific step content described in the method of the foregoing embodiment can be understood as the function that can be realized by the system of the present embodiment, and is not described herein again.

Moreover, the traffic participant future trajectory prediction system according to the above embodiment may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product.

Another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the traffic participant future trajectory prediction method described in the above embodiment.

Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for predicting a future trajectory of a traffic participant is characterized by comprising the following steps:

inputting the current time position coordinates into a pre-trained GAT model to be processed and output feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the characteristic vector of the node represents social interaction information between the traffic participant corresponding to the node and the traffic participants corresponding to other nodes;

obtaining track point feature vectors of the plurality of traffic participants at the current moment according to the current-moment position coordinates of the plurality of traffic participants and the feature vectors of the nodes corresponding to the plurality of traffic participants;

inputting the track sequence data into a pre-trained Transformer model for predicting and outputting the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

2. The method as claimed in claim 1, wherein the obtaining the position coordinates of a plurality of traffic participants at the current time comprises:

3. The method as claimed in claim 1, wherein the inputting the current time position coordinates into a pre-trained GAT model for processing and outputting feature vectors of a plurality of nodes comprises:

4. The method of predicting a traffic participant future trajectory according to claim 3, wherein the dual random adjacency matrix is represented by the following expression:

wherein the content of the first and second substances,

and N is the total number of nodes.

5. The method for predicting the future trajectory of the traffic participant according to claim 4, wherein the step of calculating attention coefficients according to the initial feature vectors of the nodes and an attention mechanism and obtaining feature vectors of all the nodes according to feature quantities of connecting edges between the nodes and the attention coefficients comprises the steps of:

6. The method of predicting future trails of traffic participants according to claim 1, wherein the Transformer model comprises an encoder and a decoder;

the encoder is used for carrying out position encoding on the track sequence data to obtain track sequence encoded data, inputting the track sequence encoded data into a plurality of transform-blocks which are sequentially superposed for carrying out information encoding, and outputting encoded data to be processed; the track sequence coding data comprise a plurality of track point feature vectors subjected to position coding; each track point feature vector subjected to position coding is position coding plus a track point feature vector at a corresponding position in the track sequence data, track point feature vectors at even positions of the track sequence are subjected to sine coding, and track point feature vectors at odd positions of the track sequence are subjected to cosine coding;

7. A system for predicting a future trajectory of a traffic participant comprises.

The coordinate acquisition unit is used for acquiring current time position coordinates of a plurality of traffic participants;

the social interaction unit is used for inputting the current time position coordinates into a pre-trained GAT model for processing and outputting feature vectors of a plurality of nodes; each node in the GAT model represents a traffic participant, and the feature vector of the node represents social interaction information between the traffic participant corresponding to the node and traffic participants corresponding to other nodes;

the first track point feature processing unit is used for obtaining track point feature vectors of the plurality of traffic participants at the current moment according to the current-moment position coordinates of the plurality of traffic participants and the feature vectors of the nodes corresponding to the plurality of traffic participants;

and the future track prediction unit is used for inputting the track sequence data into a pre-trained Transformer model for predicting and outputting the position coordinates of the track points of the plurality of traffic participants in a future preset time period.

8. The system of claim 7, wherein the social interaction unit comprises:

9. The system of claim 7, wherein the fransformer model comprises an encoder and a decoder;

the encoder is used for carrying out position encoding on the track sequence data to obtain track sequence encoded data, inputting the track sequence encoded data into a plurality of transform-blocks which are sequentially superposed for carrying out information encoding, and outputting encoded data to be processed; the track sequence coding data comprise a plurality of track point feature vectors subjected to position coding; each track point feature vector subjected to position coding is position coding plus a track point feature vector at a corresponding position in the track sequence data, the track point feature vectors at even number positions of the track sequence are subjected to sine coding, and the track point feature vectors at odd number positions of the track sequence are subjected to cosine coding;

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for predicting a future trajectory of a traffic participant according to any one of claims 1 to 6.