CN112766561B

CN112766561B - Attention mechanism-based generation type countermeasure track prediction method

Info

Publication number: CN112766561B
Application number: CN202110053547.8A
Authority: CN
Inventors: 房芳; 张鹏鹏; 周波; 钱堃; 甘亚辉
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2023-11-17
Anticipated expiration: 2041-01-15
Also published as: CN112766561A

Abstract

The application discloses a generated type countermeasure track prediction method based on an attention mechanism. The method comprises the steps of firstly extracting hidden features of pedestrian motion from a pedestrian track by using an encoder module formed by a long-short-time memory network LSTM, then carrying out influence weight distribution on pedestrians in the same scene by using a pooling module based on an attention mechanism to fully extract interaction information among the pedestrians, and finally outputting pedestrian track coordinates predicted by a network through a decoder module. The method provided by the application can improve the prediction precision of the track, and can generate a plurality of prediction tracks conforming to the social specifications, and the method can be used in a navigation planning system of a mobile robot, thereby being beneficial to planning a more reasonable and effective path of the robot navigation system in a co-fused environment with people.

Description

Attention mechanism-based generation type countermeasure track prediction method

Technical Field

The application relates to the field of artificial intelligence, in particular to a method for predicting a generated countermeasure track based on an attention mechanism.

Background

The pedestrian trajectory prediction means predicting a motion trajectory of a pedestrian for a future period from a motion trajectory of the pedestrian for a past period. With the rise of fields such as mobile service robots and automatic driving, pedestrian track prediction in dynamic scenes becomes a popular research direction. The correct prediction of the pedestrian track helps the intelligent navigation system to plan a more reasonable and effective path. However, the problem of pedestrian track prediction is extremely complex, and the motion of pedestrians has certain randomness, and the motion is relatively subjective and flexible in the decision making process, so that the pedestrian track has various characteristics. Secondly, during the walking process of the pedestrians, the track of the pedestrians is affected by the surrounding dynamic environment, and the pedestrians generally adjust the paths of the pedestrians according to the common knowledge and the social specifications of the pedestrians. The above-described features make the pedestrian trajectory prediction problem challenging.

In the pedestrian trajectory prediction problem, how to effectively model the interactions between pedestrians is important for pedestrian trajectory prediction. The current mainstream method is mainly based on deep learning technology to learn interaction between pedestrians so as to predict the tracks of the pedestrians. Among them, the method based on the long-short time memory network LSTM has proven to be very effective in handling timing problems, but the method based on LSTM cannot effectively model the spatial relationship between pedestrians. To solve this problem, alahi et al proposed a Social long-short term memory network (S-LSTM) based on an LSTM network model that predicts a plurality of Social-compliant trajectories (see "society LSTM: human trajectory prediction in crowded spaces, CVPR 2016") by gridding a space and hiding and pooling different features of pedestrians around each pedestrian according to the grid. Because the method only can model pedestrian interaction in the local area of the target pedestrian, interaction of all pedestrians in the scene cannot be efficiently simulated. Gupta et al introduce a generated countermeasure network into the problem of pedestrian track prediction, put forward a Social countermeasure network model, and through carrying out reverse training on a generator and a discriminator and extracting interaction information of all pedestrians in a scene by a pooling module, various tracks conforming to Social specifications are generated, and prediction accuracy is improved (see 'Social GAN: socially acceptable trajectories with generative adversarial networks, CVPR 2018'). However, the method only considers the spatial position relation between pedestrians when extracting the interactive information between pedestrians, ignores the influence of the surrounding pedestrian motion direction, speed and other factors on the future track of the target pedestrians, and cannot fully extract the interactive information between pedestrians. In addition, the method based on the generation type countermeasure network is extremely easy to generate the phenomenon of unbalanced strength of the generator and the discriminator in the network training process, so that the problem that the gradient disappears and the training is difficult is caused.

Aiming at the problems, the Guangdong university of industry applies for a pedestrian track prediction method based on long-term and short-term memory with the patent number of 202010110743.X, and the patent name of the pedestrian track prediction method based on long-term and short-term memory mainly comprises the following steps: preprocessing the data and converting the data into a matrix of [ pedestrian number, 4 ]; the attention introducing mechanism selects information influencing indexes such as direction, speed and the like of the current pedestrian walking, and connects all current position information through the full connection layer; the history state hidden information of the global pedestrians in the same scene is input into a pooling layer for pooling, so that the purpose of sharing the global hidden information is achieved; the method comprises the steps that a long-term memory unit is used for converting the pooled tensor of the historical state hidden information of all pedestrians in the current state, the position information of the current pedestrians and the information which is selected by an attention mechanism and affects the pedestrians into long-term and short-term memory sequence information; converting the current state information into a coordinate space through a multi-layer perceptron structure to generate a predicted track sequence:

the patent still suffers from the following drawbacks:

firstly, in the aspect of the attention mechanism, the method for acquiring the attention weight only considers the relative position information of the ith pedestrian relative to the jth pedestrian, and does not comprehensively consider the factors such as the speed of the pedestrian j, the moving direction relative to the pedestrian i, the relative distance between the pedestrian i and the like to acquire the attention mechanism, so that the applicant improves the attention weight in the following ways: in order to describe the influence of the pedestrian j on the movement of the target pedestrian i, the attention pooling module carries out velocity vector v of the pedestrian j _j Distance between pedestrian i and pedestrian jVector d _ij Velocity vector v of pedestrian i _i Distance vector d to pedestrian j _ij Included angle a of (a) _ij Cosine value cosa _ij Velocity vector v of pedestrian i _i Velocity vector v with pedestrian j _j Included angle b of (2) _ij Cosine value cosb _ij Merging into feature vector q _ij A multi-layer fully connected network using the softmax function as an activation function is fed to obtain the attention weight of pedestrian j to the target pedestrian i in the scene.

Secondly, aiming at the problem that training gradient disappears and training is difficult due to mismatching of the strength of a generator and the strength of a discriminator in the training process of the traditional GAN network, noise which is reduced along with time is introduced into the discriminator end in the training process by modifying the loss function, the training effect of a model is improved, and the prediction accuracy of a track is improved. The loss function of a GAN network can be expressed as:

L _{tran_GAN} ＝E _x [logD(x)]+E _z [log(1-D(G(z)))]

while the loss function of our modified GAN network is expressed as:

L _GAN ＝E _x [logh(D(x))]+E _z [log(1-h(D(G(z))))]

where h (·) represents a decreasing noise function over time. The advantage of this improvement is that the training data set data distribution and the generator generated data distribution intersect very little during the initial stage of the network training, so that the arbiter can easily distinguish between real data and generated data, and the network lacks training gradients. Therefore, a certain noise is added at the discriminator end in the initial training stage so that a certain intersection exists between training data and generated data. With the increase of training time, the distribution of data generated by the generator gradually approaches to the real data distribution, and the gradual reduction of noise can still ensure that the network has a certain training gradient, so that the training effect of the network is improved.

Disclosure of Invention

In order to solve the above-mentioned problems. The application provides a generating type countermeasure track prediction method based on an attention mechanism, which is used for fully extracting interaction information between pedestrians so as to improve the track prediction precision. If the method is used in a navigation planning system of the service robot, the service robot can plan a more rational and effective path in a dynamic environment which is blended with people, so that the navigation comfort is improved.

The application provides a generating type countermeasure track prediction method based on an attention mechanism, which is characterized by comprising the following steps:

step 1: preprocessing pedestrian track data and sending the pedestrian track data into an encoder for encoding processing;

step 2: sending the encoded vector to a pooling module based on an attention mechanism for influence weight distribution and obtaining a pooled vector;

step 3: outputting a predicted trajectory of the pedestrian using a decoder based on the LSTM network;

step 4: performing countermeasure training on the generator and the discriminator by using an Adam algorithm by utilizing the improved loss function;

step 5: and sending the observed track of the pedestrian into a trained network model generator to obtain predicted track coordinates of the pedestrian.

Further, the encoding processing of the pedestrian track in the step 1 includes:

the network receives the historical track of the pedestrian and takes the full-connection network with a single layer as an embedded layer to change the position information of the pedestrian i at the moment tIs converted into a feature vector of a fixed length +.>Then the vector is sent into an LSTM network for coding processing, the time sequence characteristics of track data are learned, and the hidden state of the pedestrian i at the moment t is obtained>

Wherein f (·) is an embedded layer employing a ReLU activation function, W _f And W is _encoder The weight parameters of the embedded layer and the LSTM network, respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene.

Further, the impact weight distribution of pedestrians in the same scene is performed by the pooling module based on the attention mechanism in the step 2, and a pooled vector representing pedestrian interaction information is output, including:

in order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode _ij And the velocity vector v of the pedestrian j _j Distance vector d between pedestrian i and pedestrian j _ij Velocity vector v of pedestrian i _i Distance vector d to pedestrian j _ij Included angle a of (a) _ij Cosine value cosa _ij Velocity vector v of pedestrian i _i Velocity vector v with pedestrian j _j Included angle b of (2) _ij Cosine value cosb _ij Merging into feature vector q _ij Sending into a multi-layer fully connected network using a softmax function as an activation function, thereby obtaining the attention weight of a pedestrian j to a target pedestrian i in a scene;

then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H _i And combining the attention weights of different pedestrians into a weight matrix W _{atten_i} Finally, the weight matrix W _{atten_i} Pooled vector H with pooling _i Multiplication to obtain a feature vector p _hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode _i The pooling vector characterizes information required by the target pedestrian i to make a decision, intuitively understands that the weight of all people in the scene on the future track influence of the target pedestrian i is obtained by adopting an attention mechanism, and thus the information p required by the target pedestrian i to make the decision is summarized _i Thereby reachingFor the purpose of pedestrian interaction modeling, a specific formula calculation is as follows:

q _ij ＝[v _j ,d _ij ,cosa _ij ,cosb _ij ]

q _i ＝[q _i1 ,q _i2 ,...,q _ij ,...,q _iN ]

W _{atten_i} ＝s(q _i ；W _s )

H _i ＝[h _i1 ,h _i2 ,...,h _ij ,...,h _iN ]

p _i ＝maxpool(W _{atten_i} H _i )

wherein s (·) represents a multi-layer fully connected network using a softmax activation function, W _s Is a weight parameter for the network.

Further, the outputting, by using an LSTM network-based decoder, the predicted trajectory of the pedestrian in step 3 includes:

pooling vector for outputting attention pooling moduleHidden layer vector outputted by encoder module>And random noise z satisfying Gaussian distribution is combined into a feature vector as an initial input of the decoder +.>The decoder firstly converts the position change of the pedestrian at the latest moment into a feature space through a fully connected network to obtain a feature vector +.>Then obtaining the current hidden state through LSTM network>Finally, converting the coordinate space into a predicted track coordinate through a fully connected networkThe overall calculation formula of the decoder is as follows:

wherein j (·), m (·) and g (·) are all fully connected networks with a ReLU activation function, W _j 、W _m And W is _g The weight parameters of the three networks, W _decoder Is a weight parameter of the LSTM network.

Further, the reverse training of the generator and the arbiter with the improved loss function described in step 4 includes:

reverse training of networks using Adam algorithm with improved loss function, which mainly consists of two parts, one part being the counterloss L of GAN network _GAN Another part is the loss of position shift L between the real track and the predicted track ₂ ；

Let the distribution represented by the real training data x be p _data I.e. x-p _data (x) The generator samples z from the prior noise distribution p, i.e. z-p (z), and the training process of the GAN network essentially makes the data distribution represented by the output G (z) of the generator as close as possible to the real training set data distribution, the training loss function L of the conventional GAN network _{tran_GAN} Expressed as:

L _{tran_GAN} ＝E _x [logD(x)]+E _z [log(1-D(G(z)))]

however, in the training process of the traditional GAN network, the situation that the generated data of the generator and the real data of the training set are easily distinguished due to the fact that the discrimination capability of the discriminator is too strong, so that gradient disappearance cannot be trained is caused, in order to solve the problem that the training of the traditional GAN network is difficult, step 4 applies noise which is reduced with time to a loss function of the discriminator end in the training process of the GAN network, so that a certain intersection exists between the training data and the generated data, the distribution of the generated data of the generator gradually approaches to the real data distribution along with the increase of training time, and the condition that the network has a certain training gradient can be ensured by gradually reducing the noise; thus, improved countering loss L _GAN Expressed as:

L _GAN ＝E _x [logh(D(x))]+E _z [log(1-h(D(G(z))))]

wherein h (·) represents a noise function that decreases over time;

to encourage the network to generate multiple social-compliant trajectories, the network samples k predicted trajectories at a time and selects the trajectory with the smallest position offset error for calculating the position offset loss, thus the position offset loss L of the network ₂ Expressed as:

wherein Y is _i Andrespectively representing the real track and the predicted track of the pedestrian i;

thus, the loss function of the network as a whole is expressed as:

L _total ＝L _GAN +lL ₂

wherein l is a superparameter.

Further, the step 5 of sending the observed track of the pedestrian to the generator to obtain the predicted track coordinates of the pedestrian, includes:

step 1, step 2 and step 3 are sequentially executed, namely the observed track of the pedestrian is sent into an encoder to be encoded so as to obtain the hidden feature of the pedestrian motion, the interaction information of the pedestrian is extracted through an attention pooling module, and finally the predicted track coordinates of the pedestrian are output through a decoder.

Compared with the prior art, the technical scheme provided by the application has the following beneficial effects:

1. aiming at the defect that the existing method cannot fully extract the interaction information between pedestrians, by introducing an attention pooling module, elements such as the movement direction, the speed and the like of the pedestrians are related with future tracks of the pedestrians, and the influence weight distribution is carried out on the pedestrians in the same scene, so that the interaction information between the pedestrians is extracted more effectively, and meanwhile, the interpretability of the model is improved.

2. Aiming at the problem that training gradient disappears and training is difficult due to mismatching of the strength of a generator and the strength of a discriminator in the training process of the generating type countermeasure network, noise which is reduced along with time is introduced into the discriminator in the training process by modifying the loss function, the training effect of a model is improved, and the prediction accuracy of a track is improved.

Drawings

FIG. 1 is a schematic diagram of the workflow of the present application;

FIG. 2 is an overall block diagram of a network model;

FIG. 3 is a schematic diagram of an attention pooling module;

fig. 4 is a schematic diagram of a GAN network training process;

FIG. 5 is a visual comparison of predicted trajectories.

Detailed Description

The application is described in further detail below with reference to the attached drawings and detailed description:

the application provides a generating type countermeasure track prediction method based on an attention mechanism, which is used for fully extracting interaction information between pedestrians so as to improve the track prediction precision. If the method is used in a navigation planning system of the service robot, the service robot can plan a more rational and effective path in a dynamic environment which is blended with people, so that the navigation comfort is improved.

As shown in fig. 1 and 2, the overall structure diagram of the network model of the present application mainly includes a generator module and a discriminator module. The generator module is based on an encoder-decoder architecture and comprises an encoder, an attention pooling module and a decoder, the generator receives the historical track of the pedestrian, the track of the pedestrian is encoded by the encoder to obtain the hidden characteristic of the pedestrian, the pooling module combined with the attention mechanism is used for extracting the interaction information of the pedestrian, and finally the decoder module is used for outputting the pedestrian position coordinates predicted by the network. The discriminator module mainly consists of an encoder module which accepts track inputs and encodes the tracks by an encoder, and then scores the true extent of the tracks by a classification network.

The method provided by the application specifically comprises the following steps:

step 1: preprocessing pedestrian track data and sending the pedestrian track data to an encoder for encoding;

the future trajectory of pedestrians is always influenced by the pedestrians in front and is related to the speed, direction of movement, relative distance, etc. of these pedestrians. As shown in fig. 3, the future trajectory of the target pedestrian 1 is mainly affected by the pedestrians 2 and 3 in front of the line of sight, which is hardly affected by the pedestrian 4. And the greater the speed of the pedestrian 2, the smaller the relative distance from the pedestrian 1, the greater its influence on the trajectory of the pedestrian 1.

In order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode _ij And the velocity vector v of the pedestrian j _j Distance vector d between pedestrian i and pedestrian j _ij Velocity vector v of pedestrian i _i Distance vector d to pedestrian j _ij Included angle a of (a) _ij Cosine value cosa _ij Velocity vector v of pedestrian i _i Velocity vector v with pedestrian j _j Included angle b of (2) _ij Cosine value cosb _ij Merging into feature vector q _ij A multi-layer fully connected network using the softmax function as an activation function is fed to obtain the attention weight of pedestrian j to the target pedestrian i in the scene.

Then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H _i And combining the attention weights of different pedestrians into a weight matrix W _{atten_i} . Finally, weight matrix W _{atten_i} Pooled vector H with pooling _i Multiplication to obtain a feature vector p _hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode _i . Intuitively understand, by taking attentionThe mechanism obtains the weight of the influence of all people in the scene on the future track of the target pedestrian i, so as to summarize the information p required by the target pedestrian i to make a decision _i Therefore, the purpose of pedestrian interaction modeling is achieved. The specific formula calculation is as follows:

q _ij ＝[v _j ,d _ij ,cosa _ij ,cosb _ij ]

q _i ＝[q _i1 ,q _i2 ,...,q _ij ,...,q _iN ]

W _{atten_i} ＝s(q _i ；W _s )

H _i ＝[h _i1 ,h _i2 ,...,h _ij ,...,h _iN ]

p _i ＝maxpool(W _{atten_i} H _i )

the improved loss function mainly comprises two parts, one part is the counterloss L of the GAN network _GAN Another part is the loss of position shift L between the real track and the predicted track ₂ 。

Let the distribution represented by the real training data x be p _data (i.e. x-p _data (x) The generator samples z (i.e., z-p (z)) from the a priori noise distribution p and the GAN network training process is essentially such that the data distribution represented by the generator's output G (z) is as close as possible to the true training set data distribution. Training loss function L of traditional GAN network _{tran_GAN} Can be expressed as:

L _{tran_GAN} ＝E _x [logD(x)]+E _z [log(1-D(G(z)))]

however, the traditional GAN network is very easy to have the situation that the discrimination capability of the discriminator is too strong in the training process, so that the generated data of the generator and the real data of the training set can be easily distinguished, and the gradient vanishes and cannot be trained.

In order to solve the problem of difficulty in training the conventional GAN network, step 4 applies noise which decreases with time to the loss function of the arbiter during training the GAN network, as shown in fig. 4, in which the dark solid line represents the training set data distribution p _data (x) The light solid line represents the generator generated data distribution p _G (z). In the early stage of network training, the intersection of the two distributions is small, so that the discriminator can easily distinguish real data from generated data, and the network lacks training gradient. Therefore, a certain noise is added at the discriminator end in the initial training stage so that a certain intersection exists between training data and generated data. With the increase of training time, the distribution of the data generated by the generator gradually approaches to the real data distribution, and the gradual reduction of noise can still ensure that the network has a certain training gradient. Thus, the fight loss function L proposed herein _GAN Can be expressed as:

L _GAN ＝E _x [logh(D(x))]+E _z [log(1-h(D(G(z))))]

where h (·) represents a decreasing noise function over time.

To encourage the network to generate multiple social-compliant trajectories, the network samples k predicted trajectories at a time and selects the trajectory with the smallest position offset error for calculating the position offset loss, thus the position offset loss L of the network ₂ Can be expressed as:

wherein Y is _i Andthe real track and the predicted track of the pedestrian i are respectively represented.

Thus, the loss function of the network as a whole can be expressed as:

L _total ＝L _GAN +lL ₂

wherein l is a superparameter.

Step 5: sending the observation track of the pedestrian into a trained generator of the network model to obtain predicted track coordinates of the pedestrian;

the method comprises the steps of step 1, step 2 and step 3, namely, the observed track of the pedestrian is sent into an encoder to be encoded so as to obtain the hidden characteristic of the pedestrian motion, the interaction information of the pedestrian is extracted through an attention pooling module, and finally, the predicted track coordinates of the pedestrian are output through a decoder.

Fig. 5 illustrates three representative pedestrian trajectory prediction scenarios. In each scene, the left sub-graph represents a real motion track of a pedestrian, the right sub-graph represents an observation track and a prediction track of the pedestrian, and the solid circles and the stars represent the observation track and the prediction track respectively. It can be seen that the method provided by the application can capture complex interactions among pedestrians, such as accompanying, mutual gifts and the like, the predicted track is more in accordance with the actual motion scene, and the track predicted by the network does not conflict with other tracks. Therefore, the predicted track output by the network model provided by the application meets the social specification and meets the physical constraint.

TABLE 1 ADE and FDE comparison of different models (t _pred ＝8/12)

The present application uses the following two indicators to characterize the accuracy of the predicted trajectory.

1) Average offset error (Average Displacement Error, ADE). The average value of euclidean distance of the predicted trajectory from the real trajectory sequence at each time step is represented.

2) Final offset error (Final Displacement Error, FDE). And the Euclidean distance between the predicted track and the real track sequence at the final moment is represented.

The application selects the most representative Linear, LSTM, S-LSTM and SGAN network models as the comparison standard, and the comparison results of various track prediction models are shown in table 1. Wherein, the data units in the table are meters, the thickened data represent the best result, the Atten-GAN is the network model corresponding to the application, the +DN represents that the Atten-GAN introduces noise which decreases with time in the training process, and the-DN is opposite.

The comprehensive table data can show that the application can selectively fuse information influencing the future track of the target pedestrian due to the introduction of the attention pooling mechanism, so that the model has stronger expressive force and can more accurately describe the interaction of the pedestrian. Meanwhile, noise which is reduced along with time is added in the discriminator in the training process, so that the problem of gradient disappearance caused by unbalanced strength of the generator and the discriminator can be improved to a certain extent, and the prediction accuracy of the network is further improved.

The above description is only of the preferred embodiment of the present application, and is not intended to limit the present application in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present application, which fall within the scope of the present application as defined by the appended claims.

Claims

1. The generating type countermeasure track prediction method based on the attention mechanism is characterized by comprising the following steps:

the encoding process for the pedestrian track in the step 1 includes:

Wherein phi (·) is an embedded layer employing a ReLU activation function, W _φ And W is _encoder The weight parameters of the embedded layer and the LSTM network are respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene;

the step 2 of performing influence weight distribution on pedestrians in the same scene through a pooling module based on an attention mechanism, and outputting a pooling vector representing pedestrian interaction information, includes:

in order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode _ij And the velocity vector v of the pedestrian j _j Distance vector d between pedestrian i and pedestrian j _ij Velocity vector v of pedestrian i _i Distance vector d to pedestrian j _ij Included angle a of (a) _ij Cosine value cosa _ij Velocity vector v of pedestrian i _i Velocity vector v with pedestrian j _j Included angle beta of (2) _ij Cosine value cos beta _ij Merging into feature vector θ _ij Delivering a function using softmax as activationThe multi-layer full-connection network of the function, so that the attention weight of the pedestrian j to the target pedestrian i in the scene is obtained;

then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H _i And combining the attention weights of different pedestrians into a weight matrix W _{atten_i} Finally, the weight matrix W _{atten_i} Pooled vector H with pooling _i Multiplication to obtain a feature vector p _hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode _i The pooling vector characterizes information required by the target pedestrian i to make a decision, intuitively understands that the weight of all people in the scene on the future track influence of the target pedestrian i is obtained by adopting an attention mechanism, and thus the information p required by the target pedestrian i to make the decision is summarized _i Therefore, the purpose of pedestrian interaction modeling is achieved, and a specific formula is calculated as follows:

θ _ij ＝[v _j ,d _ij ,cosa _ij ,cosβ _ij ]

θ _i ＝[θ _i1 ,θ _i2 ,...,θ _ij ,...,θ _iN ]

W _{atten_i} ＝σ(θ _i ；W _s )

H _i ＝[h _i1 ,h _i2 ,...,h _ij ,...,h _iN ]

p _i ＝maxpool(W _{atten_i} H _i )

wherein σ (·) represents a multi-layer fully connected network using a softmax activation function, W _s A weight parameter for the network;

the outputting the predicted track of the pedestrian by using a decoder based on the LSTM network in step 3 includes:

pooling vector for outputting attention pooling moduleEncoder module inputThe hidden layer vector +.>And random noise z satisfying Gaussian distribution is combined into a feature vector as an initial input of the decoder +.>The decoder firstly converts the position change of the pedestrian at the latest moment into a feature space through a fully connected network to obtain a feature vector +.>Then obtaining the current hidden state through LSTM network>Finally, converting the coordinate space into a predicted track coordinate through a fully connected networkThe overall calculation formula of the decoder is as follows:

wherein,mu (-) and gamma (-) are all fully connected networks with ReLU activation function,/-, and>W _μ and W is _γ The weight parameters of the three networks, W _decoder The weight parameters of the LSTM network;

the reverse training of the generator and the arbiter with the improved loss function described in step 4 comprises:

L _{tran_GAN} ＝E _x [logD(x)]+E _z [log(1-D(G(z)))]

however, in order to solve the problem that the conventional GAN network is difficult to train, step 4 applies noise which is reduced with time to the loss function of the discriminator end in the training process to the GAN network, so that a certain intersection exists between the training data and the generated data, and the distribution of the generated data of the generator gradually approaches to the distribution of the real data along with the increase of training timeAt the moment, the noise is gradually reduced, so that the network can still be ensured to have a certain training gradient; thus, improved countering loss L _GAN Expressed as:

L _GAN ＝E _x [logη(D(x))]+E _z [log(1-η(D(G(z))))]

where η (·) represents a noise function that decreases over time;

thus, the loss function of the network as a whole is expressed as:

L _total ＝L _GAN +λL ₂

wherein lambda is a super parameter;

and 5, sending the observed track of the pedestrian into a generator to obtain predicted track coordinates of the pedestrian, wherein the method comprises the following steps: