CN112766561A

CN112766561A - Generating type confrontation track prediction method based on attention mechanism

Info

Publication number: CN112766561A
Application number: CN202110053547.8A
Authority: CN
Inventors: 房芳; 张鹏鹏; 周波; 钱堃; 甘亚辉
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-05-07
Anticipated expiration: 2041-01-15
Also published as: CN112766561B

Abstract

The invention discloses an attention mechanism-based generative confrontation track prediction method, which designs an attention mechanism-based generative confrontation network and trains the generative confrontation network by using an improved loss function. The method comprises the steps of firstly, utilizing an encoder module formed by a long-time memory network LSTM to extract hidden characteristics of pedestrian movement from a pedestrian track, then utilizing a pooling module based on an attention mechanism to distribute influence weights to pedestrians in the same scene so as to fully extract interaction information among the pedestrians, and finally outputting pedestrian track coordinates predicted by a network through a decoder module. The method provided by the invention can improve the prediction precision of the track, can generate a plurality of predicted tracks following the social standard, and can be used in a navigation planning system of the mobile robot, thereby being beneficial to the robot navigation system to plan more reasonable and effective paths in an environment of being merged with people.

Description

Generating type confrontation track prediction method based on attention mechanism

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method for predicting a generating type confrontation track based on an attention mechanism.

Background

The pedestrian trajectory prediction means that the motion trajectory of a pedestrian in a future period of time is predicted according to the motion trajectory of the pedestrian in a past period of time. With the rise of the fields of mobile service robots, automatic driving and the like, the prediction of pedestrian trajectories in dynamic scenes has become a popular research direction. The correct prediction of the pedestrian trajectory is helpful for the intelligent navigation system to plan a more reasonable and effective path. However, the pedestrian trajectory prediction problem is extremely complex, the motion of the pedestrian has certain randomness, and the pedestrian trajectory is subjective and flexible in the decision making process, so that the pedestrian trajectory has the characteristic of diversity. Secondly, in the walking process of the pedestrian, the track of the pedestrian is affected by the surrounding dynamic environment, and the pedestrian usually adjusts the path of the pedestrian according to the common knowledge and social regulations. The above features make the pedestrian trajectory prediction problem challenging.

In the problem of pedestrian trajectory prediction, how to effectively model the interaction between pedestrians is very important for pedestrian trajectory prediction. Most of the existing mainstream methods learn the interaction among pedestrians based on a deep learning technology, so that the pedestrian track is predicted. Among them, the method based on the long and short term memory network LSTM has proven to be very effective in dealing with timing problems, but the method based on LSTM cannot effectively model the spatial relationship between pedestrians. To solve this problem, Alahi et al proposed a Social long-short term memory network (S-LSTM) based on an LSTM network model that predicts a plurality of Social normative trajectories using pooled results by gridding the space and pooling hidden features of pedestrians around each pedestrian according to the grid (see "Human trajectory prediction in crowded spaces, CVPR 2016"). Because the method only models the pedestrian interaction in the local area of the target pedestrian, the interaction of all pedestrians in the scene cannot be simulated efficiently. Gupta et al introduces generative confrontation networks into the pedestrian trajectory prediction problem, proposes a Social confrontation network model, generates a plurality of trajectories meeting Social regulations by performing reverse training on a generator and a discriminator and extracting interactive information of all pedestrians in a scene through a pooling module, and improves prediction accuracy (see 'Social GAN: Social accessible targets with generic adaptive networks, CVPR 2018'). However, in the method, only the spatial position relationship between pedestrians is considered when the mutual information between the pedestrians is extracted, and the influence of factors such as the motion direction and the speed of surrounding pedestrians on the future track of the target pedestrian is ignored, so that the mutual information between the pedestrians cannot be fully extracted. In addition, the method of fighting the network based on the generation formula is easy to cause the phenomenon of imbalance between the strength of the generator and the strength of the discriminator in the network training process, thereby causing the problem of difficult training due to the disappearance of the gradient.

In order to solve the problems, the Guangdong industry university applies for a pedestrian trajectory prediction method based on long-term and short-term memory with the patent number of 202010110743.X, and the invention discloses a pedestrian trajectory prediction method based on long-term and short-term memory, which mainly comprises the following steps: preprocessing the data and converting the data into a matrix of [ the number of pedestrians, 4 ]; an attention mechanism is introduced to select information influencing indexes such as direction, speed and the like when the current pedestrian walks, and all current position information is connected through a full-connection layer; inputting historical state hidden information of global pedestrians in the same scene into a pooling layer for pooling so as to achieve the purpose of sharing the global hidden information; converting the pooling tensor of the historical state hidden information of all the pedestrians in the current state, the position information of the current pedestrian and the information which is selected by the attention mechanism and influences the pedestrians into long-short term memory sequence information through the long-short term memory unit; converting the current state information into a coordinate space through a multi-layer perceptron structure to generate a predicted track sequence:

the patent still has the following defects:

first, in the attention mechanism of the patent, the method for obtaining attention weight only takes into accountThe attention mechanism is obtained by considering the relative position information of the ith pedestrian relative to the jth pedestrian and not comprehensively considering factors such as the speed of the pedestrian j, the moving direction relative to the pedestrian i, the relative distance from the pedestrian i and the like, so that the attention weight is improved by the applicant in the following way: in order to depict the influence of the pedestrian j on the movement of the target pedestrian i, the attention pooling module is used for converting the velocity vector v of the pedestrian j_jDistance vector d between pedestrian i and pedestrian j_ijVelocity vector v of pedestrian i_iDistance vector d to pedestrian j_ijAngle a of_ijCosine value cosa_ijVelocity vector v of pedestrian i_iVelocity vector v associated with pedestrian j_jAngle b of_ijCosine value cosb of_ijAre combined into a feature vector q_ijAnd sending a multilayer full-connection network using a softmax function as an activation function so as to obtain the attention weight of the pedestrian j to the target pedestrian i in the scene.

The method aims at the problem that the training gradient disappears and is difficult to train due to the fact that the strength of a generator and a discriminator in the training process of the traditional GAN network is not matched, noise which is reduced along with time is introduced at the discriminator end in the training process by modifying a loss function, the training effect of the model is improved, and the prediction accuracy of the track is improved. The loss function of the GAN network can be expressed as:

L_{tran_GAN}＝E_x[logD(x)]+E_z[log(1-D(G(z)))]

and the loss function of our improved GAN network is expressed as:

L_GAN＝E_x[logh(D(x))]+E_z[log(1-h(D(G(z))))]

where h (-) represents a noise function that decreases over time. The advantage of this improvement is that at the initial stage of network training, the intersection of the training data set data distribution and the generator generated data distribution is small, so that the discriminator can easily distinguish between real data and generated data, and the network lacks training gradients. Therefore, in the initial training stage, a certain amount of noise is added to the discriminator so that the training data and the generated data have a certain intersection. With the increase of the training time, the distribution of the data generated by the generator is gradually close to the real data distribution, and the noise is gradually reduced at the moment, so that the network still can be ensured to have a certain training gradient, and the training effect of the network is improved.

Disclosure of Invention

To solve the above existing problems. The invention provides a generating type confrontation track prediction method based on an attention mechanism, which is used for fully extracting interaction information among pedestrians so as to improve the precision of track prediction. If the method is used in a navigation planning system of the service robot, the service robot can plan a more reasonably effective path in a dynamic environment blended with people, and therefore the navigation comfort is improved.

The invention provides an attention mechanism-based generative confrontation track prediction method, which is characterized by comprising the following steps of:

step 1: preprocessing the pedestrian track data and sending the data into an encoder for encoding;

step 2: sending the coded vector to a pooling module based on an attention mechanism for influence weight distribution to obtain a pooling vector;

and step 3: outputting the predicted trajectory of the pedestrian using an LSTM network based decoder;

and 4, step 4: carrying out countermeasure training on the generator and the discriminator by using an Adam algorithm by utilizing the improved loss function;

and 5: and (4) sending the observation track of the pedestrian into a generator of the trained network model to obtain the predicted pedestrian track coordinate.

Further, the encoding processing of the pedestrian trajectory in step 1 includes:

the network receives the historical track of the pedestrian and uses a single-layer full-connection network as an embedded layer to obtain the position change information of the pedestrian i at the time t

Conversion into a fixed-length feature vector

Then theThe vector is sent into an LSTM network for coding processing, the time sequence characteristics of the track data are learned, and the hidden state of the pedestrian i at the time t is obtained

Where f (-) is an embedded layer using the ReLU activation function, W_fAnd W_encoderAre the weight parameters of the embedding layer and LSTM network, respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene.

Further, the step 2 of assigning influence weights to pedestrians in the same scene through the attention-based pooling module, and outputting a pooling vector representing interaction information of the pedestrians includes:

in order to depict the influence of a pedestrian j on the motion of a target pedestrian i, a module firstly obtains a pooling vector h in a pooling mode_ijAnd the velocity vector v of the pedestrian j is calculated_jDistance vector d between pedestrian i and pedestrian j_ijVelocity vector v of pedestrian i_iDistance vector d to pedestrian j_ijAngle a of_ijCosine value cosa_ijVelocity vector v of pedestrian i_iVelocity vector v associated with pedestrian j_jAngle b of_ijCosine value cosb of_ijAre combined into a feature vector q_ijSending the target pedestrian i into a multilayer fully-connected network using a softmax function as an activation function, and obtaining the attention weight of a pedestrian j to the target pedestrian i in the scene;

then, converging the pooling vectors of all other pedestrians relative to the target pedestrian i in the scene into a final pooling vector H_iAnd combining the attention weights of different pedestrians into a weight matrix W_{atten_i}And finally, the weight matrix W_{atten_i}And pooling of poolsQuantity H_iMultiplying to obtain a characteristic vector p_hiAnd obtaining the pooling vector p of the target pedestrian i in a maximum pooling mode_iThe pooling vector represents information needed by the target pedestrian i to make a decision, the information is intuitively understood, and a weight of the influence of all people in a scene on the future track of the target pedestrian i is obtained by adopting an attention mechanism, so that the information p needed by the target pedestrian i to make the decision is summarized_iTherefore, the purpose of pedestrian interactive modeling is achieved, and the specific formula is calculated as follows:

q_ij＝[v_j,d_ij,cosa_ij,cosb_ij]

q_i＝[q_i1,q_i2,...,q_ij,...,q_iN]

W_{atten_i}＝s(q_i；W_s)

H_i＝[h_i1,h_i2,...,h_ij,...,h_iN]

p_i＝maxpool(W_{atten_i}H_i)

where s (-) denotes a multi-layer fully-connected network using a softmax activation function, W_sIs a weight parameter for the network.

Further, the step 3 of outputting the predicted trajectory of the pedestrian by using an LSTM network-based decoder includes:

pooling vectors output by attention pooling module

Hidden vector output by encoder module

And combining the random noise z satisfying the Gaussian distribution into a feature vector as the initial input of the decoder

The decoder firstly converts the position change of the pedestrian at the nearest moment into a feature space through a full-connection network to obtain a feature vector

Then obtaining the current hidden state through the LSTM network

Finally, the predicted track coordinate is obtained by converting the coordinate space through a full-connection network

The overall calculation formula of the decoder is as follows:

wherein j (-), m (-), and g (-), are all fully connected networks with ReLU activation functions, W_j、W_mAnd W_gThe weight parameters, W, of the three networks, respectively_decoderIs the weight parameter of the LSTM network.

Further, the inverse training of the generator and the arbiter using the improved loss function in step 4 comprises:

the network is reversely trained by using an Adam algorithm by utilizing an improved loss function, and the improved loss function mainly comprises two parts, wherein one part is the antagonistic loss L of the GAN network_GANThe other part is the loss of positional offset L between the real trajectory and the predicted trajectory₂；

Hypothesis realityThe distribution represented by the training data x of (a) is p_dataI.e. x to p_data(x) The generator samples z from the prior noise distribution p, namely z to p (z), and the process of GAN network training is essentially to make the data distribution represented by the output G (z) of the generator as close as possible to the real training set data distribution, and the training loss function L of the traditional GAN network_{tran_GAN}Expressed as:

L_{tran_GAN}＝E_x[logD(x)]+E_z[log(1-D(G(z)))]

however, the situation that the generated data of the generator and the real data of the training set can be easily distinguished due to the fact that the distinguishing capability of the discriminator is too strong in the training process of the traditional GAN network, and therefore gradient vanishes and training cannot be conducted is caused, and in order to solve the problem that the training of the traditional GAN network is difficult, step 4 is to apply noise which is reduced along with time to the loss function of the discriminator end in the training process of the GAN network, so that the training data and the generated data have certain intersection, the distribution of the generated data of the generator is gradually close to the distribution of the real data along with the increase of the training time, and at the moment, the noise is gradually reduced, and the network still can have certain training gradient; thus, improved resistance loss L_GANExpressed as:

L_GAN＝E_x[logh(D(x))]+E_z[log(1-h(D(G(z))))]

wherein h (-) represents a noise function that decreases over time;

in order to encourage the network to generate a plurality of tracks meeting the social regulations, the network samples k predicted tracks at each time and selects the track with the minimum position deviation error for calculating the position deviation loss, so the position deviation loss L of the network₂Expressed as:

wherein, Y_iAnd

respectively representing the real track and the predicted track of the pedestrian i;

thus, the overall loss function of the network is expressed as:

L_total＝L_GAN+lL₂

wherein l is a hyperparameter.

Further, the step 5 of sending the observed trajectory of the pedestrian into the generator to obtain the predicted coordinates of the pedestrian trajectory includes:

and (3) sequentially executing the step (1), the step (2) and the step (3), namely, sending the observation track of the pedestrian into an encoder to carry out encoding processing so as to obtain hidden characteristics of the movement of the pedestrian, extracting interaction information of the pedestrian through an attention pooling module, and finally outputting a predicted track coordinate of the pedestrian through a decoder.

Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:

1. aiming at the defect that the interactive information among pedestrians cannot be fully extracted by the existing method, an attention pooling module is introduced to associate the moving direction, speed and other elements of the pedestrians with the future tracks of the pedestrians, and influence weight distribution is carried out on the pedestrians in the same scene, so that the interactive information among the pedestrians is more effectively extracted, and the interpretability of the model is improved.

2. Aiming at the problem that the training gradient disappears and is difficult to train due to the fact that the generator and the discriminator are not matched in strength in the training process of the generative countermeasure network, noise which is reduced along with time is introduced at the discriminator end in the training process by modifying the loss function, the training effect of the model is improved, and the prediction precision of the track is improved.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is an overall block diagram of a network model;

FIG. 3 is a schematic view of an attention pooling module;

FIG. 4 is a schematic diagram of a GAN network training process;

fig. 5 is a comparison graph of predicted trajectory visualization.

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

the invention provides a generating type confrontation track prediction method based on an attention mechanism, which is used for fully extracting interaction information among pedestrians so as to improve the precision of track prediction. If the method is used in a navigation planning system of the service robot, the service robot can plan a more reasonably effective path in a dynamic environment blended with people, and therefore the navigation comfort is improved.

As shown in fig. 1 and 2, the overall structure of the network model of the present invention mainly includes a generator module and an arbiter module. The generator module is based on an encoder-decoder framework and comprises an encoder, an attention pooling module and a decoder, the generator receives the historical track of the pedestrian, the track of the pedestrian is encoded by the encoder to obtain hidden characteristics of the movement of the pedestrian, then interaction information of the pedestrian is extracted through the pooling module combined with the attention mechanism, and finally the pedestrian position coordinate predicted by the network is output through the decoder module. The discriminator module is mainly composed of an encoder module, receives track input, encodes the track through an encoder, and then scores the truth degree of the track through a classification network.

The method provided by the invention specifically comprises the following steps:

Conversion into a fixed-length feature vector

Then the vector is sent into an LSTM network for coding processing, the time sequence characteristics of the track data are learned, and the hidden state of the pedestrian i at the time t is obtained

the future trajectory of a pedestrian is always affected by the pedestrian ahead and is related to the speed, direction of movement, relative distance, etc. of these pedestrians. As shown in fig. 3, the future trajectory of the target pedestrian 1 is mainly affected by the

pedestrians

2 and 3 in front of the line of sight, which is hardly affected by the pedestrian 4. And the greater the speed of the pedestrian 2, the smaller the relative distance to the pedestrian 1, the greater its effect on the trajectory of the pedestrian 1.

In order to depict the influence of a pedestrian j on the motion of a target pedestrian i, a module firstly obtains a pooling vector h in a pooling mode_ijAnd the velocity vector v of the pedestrian j is calculated_jDistance vector d between pedestrian i and pedestrian j_ijVelocity vector v of pedestrian i_iDistance vector d to pedestrian j_ijAngle a of_ijCosine value cosa_ijVelocity vector v of pedestrian i_iVelocity vector v associated with pedestrian j_jAngle b of_ijCosine value cosb of_ijAre combined into a feature vector q_ijAnd sending a multilayer full-connection network using a softmax function as an activation function so as to obtain the attention weight of the pedestrian j to the target pedestrian i in the scene.

Then, all other pedestrians in the scene are pooled relative to the target pedestrian iThe quantities are converged into a final pooled vector H_iAnd combining the attention weights of different pedestrians into a weight matrix W_{atten_i}. Finally, the weight matrix W_{atten_i}Pooled vector with pooled H_iMultiplying to obtain a characteristic vector p_hiAnd obtaining the pooling vector p of the target pedestrian i in a maximum pooling mode_i. Intuitively understand that the attention mechanism is adopted to obtain the weight of the future track influence of all people in the scene on the target pedestrian i, so that the information p required by the target pedestrian i to make a decision is summarized_iTherefore, the purpose of pedestrian interactive modeling is achieved. The specific formula calculation is as follows:

q_ij＝[v_j,d_ij,cosa_ij,cosb_ij]

q_i＝[q_i1,q_i2,...,q_ij,...,q_iN]

W_{atten_i}＝s(q_i；W_s)

H_i＝[h_i1,h_i2,...,h_ij,...,h_iN]

p_i＝maxpool(W_{atten_i}H_i)

pooling vectors output by attention pooling module

Hidden vector output by encoder module

The decoder first passes through a full connectionThe network converts the position change of the pedestrian at the nearest moment into a feature space to obtain a feature vector

Then obtaining the current hidden state through the LSTM network

The overall calculation formula of the decoder is as follows:

the improved loss function consists essentially of two parts, one part being the penalty L of the GAN network_GANThe other part is the loss of positional offset L between the real trajectory and the predicted trajectory₂。

Suppose that the distribution represented by the real training data x is p_data(i.e., x to p)_data(x) The generator samples z (i.e. z-p (z)) from the prior noise distribution p, and the GAN network training process is essentially such that the data distribution represented by the generator output g (z) is as close as possible to the real training set data distribution. Training loss function L of conventional GAN network_{tran_GAN}Can be expressed as:

L_{tran_GAN}＝E_x[logD(x)]+E_z[log(1-D(G(z)))]

however, in the training process of the conventional GAN network, the situation that the gradient disappears and training cannot be performed easily is caused because the discrimination capability of the discriminator is too strong and thus the generated data of the generator and the real data of the training set can be easily distinguished.

In order to solve the problem of difficult training of the conventional GAN network, step 4 applies noise which is reduced along with time to the loss function of the GAN network at the discriminator end in the training process, as shown in FIG. 4, wherein a dark solid line represents the data distribution p of the training set_data(x) The light solid line represents the generator generated data distribution p_G(z). At the initial stage of network training, the intersection of the two distributions is small, so that the discriminator can easily distinguish real data from generated data, and the network lacks a training gradient. Therefore, in the initial training stage, a certain amount of noise is added to the discriminator so that the training data and the generated data have a certain intersection. With the increase of the training time, the distribution of the data generated by the generator is gradually close to the real data distribution, and the noise is gradually reduced at the moment, so that the network still has a certain training gradient. Thus, the penalty function L proposed herein_GANCan be expressed as:

L_GAN＝E_x[logh(D(x))]+E_z[log(1-h(D(G(z))))]

where h (-) represents a noise function that decreases over time.

In order to encourage the network to generate a plurality of tracks meeting the social regulations, the network samples k predicted tracks at each time and selects the track with the minimum position deviation error for calculating the position deviation loss, so the position deviation loss L of the network₂Can be expressed as:

wherein, Y_iAnd

respectively representing the real trajectory and the predicted trajectory of the pedestrian i.

Thus, the overall loss function of the network can be expressed as:

L_total＝L_GAN+lL₂

wherein l is a hyperparameter.

And 5: the observation track of the pedestrian is sent to a generator of the trained network model to obtain a predicted pedestrian track coordinate;

only the step 1, the step 2 and the step 3 need to be executed in sequence, namely, the observation track of the pedestrian is sent into an encoder to be encoded, so that the hidden characteristic of the movement of the pedestrian is obtained, the interaction information of the pedestrian is extracted through an attention pooling module, and finally, the predicted track coordinate of the pedestrian is output through a decoder.

Fig. 5 shows three representative pedestrian trajectory prediction scenarios. In each scene, the left sub-graph represents the real pedestrian motion track, and the right sub-graph represents the observed track and the predicted track of the pedestrian, wherein the solid circles and stars represent the observed track and the predicted track respectively. Therefore, the method provided by the invention can capture the complex interaction between pedestrians accompanied and courtesy mutually, the predicted track of the pedestrian is more consistent with the actual motion scene, and the track predicted by the network is not in conflict with other tracks. Therefore, overall, the predicted track output by the network model provided by the invention not only meets the social specification, but also meets the physical constraint.

TABLE 1 ADE and FDE comparison of different models (t)_pred＝8/12)

The invention uses the following two indexes to characterize the accuracy of the predicted trajectory.

1) Average offset Error (ADE). And the mean value of Euclidean distances between the predicted track and the real track sequence at each time step is shown.

2) Final Displacement Error (FDE). And representing the Euclidean distance between the predicted track and the real track sequence at the final moment.

The invention selects the most representative Linear, LSTM, S-LSTM and SGAN network models as comparison standards, and the comparison results of various trajectory prediction models are shown in Table 1. Wherein, the data unit in the table is meter, the bold data represents the best result, the attention-GAN is the corresponding network model of the invention, the + DN represents that the attention-GAN introduces the noise which is reduced along with the time in the training process, and the-DN is vice versa.

The data in the comprehensive table can show that the attention pooling mechanism is introduced, so that the information influencing the future track of the target pedestrian can be selectively fused, the model has stronger expressive force, and the interaction of the pedestrian can be accurately depicted. Meanwhile, noise which is reduced along with time is added into the discriminator in the training process, so that the problem that gradient disappears caused by imbalance between the generator and the discriminator can be improved to a certain extent, and the prediction precision of the network is further improved.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims

1. A generating type confrontation track prediction method based on an attention mechanism is characterized by comprising the following steps:

2. The method for predicting the generative confrontation trajectory based on the attention mechanism as claimed in claim 1, wherein: the encoding processing of the pedestrian track in the step 1 comprises the following steps:

Conversion into a fixed-length feature vector

Where f (-) is an embedded layer using the ReLU activation function, W_fAnd W_encoderRespectively an inlay layer and an LSTM meshThe weight parameters of the network, and the parameters of the LSTM network are shared by all pedestrians in the scene.

3. The method for predicting the generative confrontation trajectory based on the attention mechanism as claimed in claim 1, wherein: step 2, the pooling module based on the attention mechanism is used for distributing influence weight to the pedestrians in the same scene, and a pooling vector representing interaction information of the pedestrians is output, and the pooling vector comprises:

then, converging the pooling vectors of all other pedestrians relative to the target pedestrian i in the scene into a final pooling vector H_iAnd combining the attention weights of different pedestrians into a weight matrix W_{atten_i}And finally, the weight matrix W_{atten_i}Pooled vector with pooled H_iMultiplying to obtain a characteristic vector p_hiAnd obtaining the pooling vector p of the target pedestrian i in a maximum pooling mode_iThe pooling vector represents information needed by the target pedestrian i to make a decision, the information is intuitively understood, and a weight of the influence of all people in a scene on the future track of the target pedestrian i is obtained by adopting an attention mechanism, so that the information p needed by the target pedestrian i to make the decision is summarized_iTherefore, the purpose of pedestrian interactive modeling is achieved, and the specific formula is calculated as follows:

q_ij＝[v_j,d_ij,cosa_ij,cosb_ij]

q_i＝[q_i1,q_i2,...,q_ij,...,q_iN]

W_{atten_i}＝s(q_i；W_s)

H_i＝[h_i1,h_i2,...,h_ij,...,h_iN]

p_i＝maxpool(W_{atten_i}H_i)

4. The method for predicting the generative confrontation trajectory based on the attention mechanism as claimed in claim 1, wherein: the step 3 of outputting the predicted trajectory of the pedestrian by using an LSTM network-based decoder includes:

pooling vectors output by attention pooling module

Hidden vector output by encoder module

Then obtaining the current hidden state through the LSTM network

The overall calculation formula of the decoder is as follows:

5. The method for predicting the generative confrontation trajectory based on the attention mechanism as claimed in claim 1, wherein: step 4, performing reverse training on the generator and the arbiter by using the improved loss function, comprising:

Suppose that the distribution represented by the real training data x is p_dataI.e. x to p_data(x) The generator samples z from the prior noise distribution p, i.e. z-p (z), the process of GAN network trainingEssentially, the data distribution represented by the output G (z) of the generator is as close as possible to the real training set data distribution, and the training loss function L of the conventional GAN network_{tran_GAN}Expressed as:

L_{tran_GAN}＝E_x[logD(x)]+E_z[log(1-D(G(z)))]

L_GAN＝E_x[logh(D(x))]+E_z[log(1-h(D(G(z))))]

wherein h (-) represents a noise function that decreases over time;

wherein, Y_iAnd

thus, the overall loss function of the network is expressed as:

L_total＝L_GAN+lL₂

wherein l is a hyperparameter.

6. The method for predicting the generative confrontation trajectory based on the attention mechanism as claimed in claim 1, wherein: and 5, sending the observation track of the pedestrian into a generator to obtain the predicted coordinate of the pedestrian track, wherein the method comprises the following steps: