CN112766561B - Attention mechanism-based generation type countermeasure track prediction method - Google Patents

Attention mechanism-based generation type countermeasure track prediction method Download PDF

Info

Publication number
CN112766561B
CN112766561B CN202110053547.8A CN202110053547A CN112766561B CN 112766561 B CN112766561 B CN 112766561B CN 202110053547 A CN202110053547 A CN 202110053547A CN 112766561 B CN112766561 B CN 112766561B
Authority
CN
China
Prior art keywords
pedestrian
network
track
vector
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110053547.8A
Other languages
Chinese (zh)
Other versions
CN112766561A (en
Inventor
房芳
张鹏鹏
周波
钱堃
甘亚辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110053547.8A priority Critical patent/CN112766561B/en
Publication of CN112766561A publication Critical patent/CN112766561A/en
Application granted granted Critical
Publication of CN112766561B publication Critical patent/CN112766561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a generated type countermeasure track prediction method based on an attention mechanism. The method comprises the steps of firstly extracting hidden features of pedestrian motion from a pedestrian track by using an encoder module formed by a long-short-time memory network LSTM, then carrying out influence weight distribution on pedestrians in the same scene by using a pooling module based on an attention mechanism to fully extract interaction information among the pedestrians, and finally outputting pedestrian track coordinates predicted by a network through a decoder module. The method provided by the application can improve the prediction precision of the track, and can generate a plurality of prediction tracks conforming to the social specifications, and the method can be used in a navigation planning system of a mobile robot, thereby being beneficial to planning a more reasonable and effective path of the robot navigation system in a co-fused environment with people.

Description

Attention mechanism-based generation type countermeasure track prediction method
Technical Field
The application relates to the field of artificial intelligence, in particular to a method for predicting a generated countermeasure track based on an attention mechanism.
Background
The pedestrian trajectory prediction means predicting a motion trajectory of a pedestrian for a future period from a motion trajectory of the pedestrian for a past period. With the rise of fields such as mobile service robots and automatic driving, pedestrian track prediction in dynamic scenes becomes a popular research direction. The correct prediction of the pedestrian track helps the intelligent navigation system to plan a more reasonable and effective path. However, the problem of pedestrian track prediction is extremely complex, and the motion of pedestrians has certain randomness, and the motion is relatively subjective and flexible in the decision making process, so that the pedestrian track has various characteristics. Secondly, during the walking process of the pedestrians, the track of the pedestrians is affected by the surrounding dynamic environment, and the pedestrians generally adjust the paths of the pedestrians according to the common knowledge and the social specifications of the pedestrians. The above-described features make the pedestrian trajectory prediction problem challenging.
In the pedestrian trajectory prediction problem, how to effectively model the interactions between pedestrians is important for pedestrian trajectory prediction. The current mainstream method is mainly based on deep learning technology to learn interaction between pedestrians so as to predict the tracks of the pedestrians. Among them, the method based on the long-short time memory network LSTM has proven to be very effective in handling timing problems, but the method based on LSTM cannot effectively model the spatial relationship between pedestrians. To solve this problem, alahi et al proposed a Social long-short term memory network (S-LSTM) based on an LSTM network model that predicts a plurality of Social-compliant trajectories (see "society LSTM: human trajectory prediction in crowded spaces, CVPR 2016") by gridding a space and hiding and pooling different features of pedestrians around each pedestrian according to the grid. Because the method only can model pedestrian interaction in the local area of the target pedestrian, interaction of all pedestrians in the scene cannot be efficiently simulated. Gupta et al introduce a generated countermeasure network into the problem of pedestrian track prediction, put forward a Social countermeasure network model, and through carrying out reverse training on a generator and a discriminator and extracting interaction information of all pedestrians in a scene by a pooling module, various tracks conforming to Social specifications are generated, and prediction accuracy is improved (see 'Social GAN: socially acceptable trajectories with generative adversarial networks, CVPR 2018'). However, the method only considers the spatial position relation between pedestrians when extracting the interactive information between pedestrians, ignores the influence of the surrounding pedestrian motion direction, speed and other factors on the future track of the target pedestrians, and cannot fully extract the interactive information between pedestrians. In addition, the method based on the generation type countermeasure network is extremely easy to generate the phenomenon of unbalanced strength of the generator and the discriminator in the network training process, so that the problem that the gradient disappears and the training is difficult is caused.
Aiming at the problems, the Guangdong university of industry applies for a pedestrian track prediction method based on long-term and short-term memory with the patent number of 202010110743.X, and the patent name of the pedestrian track prediction method based on long-term and short-term memory mainly comprises the following steps: preprocessing the data and converting the data into a matrix of [ pedestrian number, 4 ]; the attention introducing mechanism selects information influencing indexes such as direction, speed and the like of the current pedestrian walking, and connects all current position information through the full connection layer; the history state hidden information of the global pedestrians in the same scene is input into a pooling layer for pooling, so that the purpose of sharing the global hidden information is achieved; the method comprises the steps that a long-term memory unit is used for converting the pooled tensor of the historical state hidden information of all pedestrians in the current state, the position information of the current pedestrians and the information which is selected by an attention mechanism and affects the pedestrians into long-term and short-term memory sequence information; converting the current state information into a coordinate space through a multi-layer perceptron structure to generate a predicted track sequence:
the patent still suffers from the following drawbacks:
firstly, in the aspect of the attention mechanism, the method for acquiring the attention weight only considers the relative position information of the ith pedestrian relative to the jth pedestrian, and does not comprehensively consider the factors such as the speed of the pedestrian j, the moving direction relative to the pedestrian i, the relative distance between the pedestrian i and the like to acquire the attention mechanism, so that the applicant improves the attention weight in the following ways: in order to describe the influence of the pedestrian j on the movement of the target pedestrian i, the attention pooling module carries out velocity vector v of the pedestrian j j Distance between pedestrian i and pedestrian jVector d ij Velocity vector v of pedestrian i i Distance vector d to pedestrian j ij Included angle a of (a) ij Cosine value cosa ij Velocity vector v of pedestrian i i Velocity vector v with pedestrian j j Included angle b of (2) ij Cosine value cosb ij Merging into feature vector q ij A multi-layer fully connected network using the softmax function as an activation function is fed to obtain the attention weight of pedestrian j to the target pedestrian i in the scene.
Secondly, aiming at the problem that training gradient disappears and training is difficult due to mismatching of the strength of a generator and the strength of a discriminator in the training process of the traditional GAN network, noise which is reduced along with time is introduced into the discriminator end in the training process by modifying the loss function, the training effect of a model is improved, and the prediction accuracy of a track is improved. The loss function of a GAN network can be expressed as:
L tran_GAN =E x [logD(x)]+E z [log(1-D(G(z)))]
while the loss function of our modified GAN network is expressed as:
L GAN =E x [logh(D(x))]+E z [log(1-h(D(G(z))))]
where h (·) represents a decreasing noise function over time. The advantage of this improvement is that the training data set data distribution and the generator generated data distribution intersect very little during the initial stage of the network training, so that the arbiter can easily distinguish between real data and generated data, and the network lacks training gradients. Therefore, a certain noise is added at the discriminator end in the initial training stage so that a certain intersection exists between training data and generated data. With the increase of training time, the distribution of data generated by the generator gradually approaches to the real data distribution, and the gradual reduction of noise can still ensure that the network has a certain training gradient, so that the training effect of the network is improved.
Disclosure of Invention
In order to solve the above-mentioned problems. The application provides a generating type countermeasure track prediction method based on an attention mechanism, which is used for fully extracting interaction information between pedestrians so as to improve the track prediction precision. If the method is used in a navigation planning system of the service robot, the service robot can plan a more rational and effective path in a dynamic environment which is blended with people, so that the navigation comfort is improved.
The application provides a generating type countermeasure track prediction method based on an attention mechanism, which is characterized by comprising the following steps:
step 1: preprocessing pedestrian track data and sending the pedestrian track data into an encoder for encoding processing;
step 2: sending the encoded vector to a pooling module based on an attention mechanism for influence weight distribution and obtaining a pooled vector;
step 3: outputting a predicted trajectory of the pedestrian using a decoder based on the LSTM network;
step 4: performing countermeasure training on the generator and the discriminator by using an Adam algorithm by utilizing the improved loss function;
step 5: and sending the observed track of the pedestrian into a trained network model generator to obtain predicted track coordinates of the pedestrian.
Further, the encoding processing of the pedestrian track in the step 1 includes:
the network receives the historical track of the pedestrian and takes the full-connection network with a single layer as an embedded layer to change the position information of the pedestrian i at the moment tIs converted into a feature vector of a fixed length +.>Then the vector is sent into an LSTM network for coding processing, the time sequence characteristics of track data are learned, and the hidden state of the pedestrian i at the moment t is obtained>
Wherein f (·) is an embedded layer employing a ReLU activation function, W f And W is encoder The weight parameters of the embedded layer and the LSTM network, respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene.
Further, the impact weight distribution of pedestrians in the same scene is performed by the pooling module based on the attention mechanism in the step 2, and a pooled vector representing pedestrian interaction information is output, including:
in order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode ij And the velocity vector v of the pedestrian j j Distance vector d between pedestrian i and pedestrian j ij Velocity vector v of pedestrian i i Distance vector d to pedestrian j ij Included angle a of (a) ij Cosine value cosa ij Velocity vector v of pedestrian i i Velocity vector v with pedestrian j j Included angle b of (2) ij Cosine value cosb ij Merging into feature vector q ij Sending into a multi-layer fully connected network using a softmax function as an activation function, thereby obtaining the attention weight of a pedestrian j to a target pedestrian i in a scene;
then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H i And combining the attention weights of different pedestrians into a weight matrix W atten_i Finally, the weight matrix W atten_i Pooled vector H with pooling i Multiplication to obtain a feature vector p hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode i The pooling vector characterizes information required by the target pedestrian i to make a decision, intuitively understands that the weight of all people in the scene on the future track influence of the target pedestrian i is obtained by adopting an attention mechanism, and thus the information p required by the target pedestrian i to make the decision is summarized i Thereby reachingFor the purpose of pedestrian interaction modeling, a specific formula calculation is as follows:
q ij =[v j ,d ij ,cosa ij ,cosb ij ]
q i =[q i1 ,q i2 ,...,q ij ,...,q iN ]
W atten_i =s(q i ;W s )
H i =[h i1 ,h i2 ,...,h ij ,...,h iN ]
p i =maxpool(W atten_i H i )
wherein s (·) represents a multi-layer fully connected network using a softmax activation function, W s Is a weight parameter for the network.
Further, the outputting, by using an LSTM network-based decoder, the predicted trajectory of the pedestrian in step 3 includes:
pooling vector for outputting attention pooling moduleHidden layer vector outputted by encoder module>And random noise z satisfying Gaussian distribution is combined into a feature vector as an initial input of the decoder +.>The decoder firstly converts the position change of the pedestrian at the latest moment into a feature space through a fully connected network to obtain a feature vector +.>Then obtaining the current hidden state through LSTM network>Finally, converting the coordinate space into a predicted track coordinate through a fully connected networkThe overall calculation formula of the decoder is as follows:
wherein j (·), m (·) and g (·) are all fully connected networks with a ReLU activation function, W j 、W m And W is g The weight parameters of the three networks, W decoder Is a weight parameter of the LSTM network.
Further, the reverse training of the generator and the arbiter with the improved loss function described in step 4 includes:
reverse training of networks using Adam algorithm with improved loss function, which mainly consists of two parts, one part being the counterloss L of GAN network GAN Another part is the loss of position shift L between the real track and the predicted track 2
Let the distribution represented by the real training data x be p data I.e. x-p data (x) The generator samples z from the prior noise distribution p, i.e. z-p (z), and the training process of the GAN network essentially makes the data distribution represented by the output G (z) of the generator as close as possible to the real training set data distribution, the training loss function L of the conventional GAN network tran_GAN Expressed as:
L tran_GAN =E x [logD(x)]+E z [log(1-D(G(z)))]
however, in the training process of the traditional GAN network, the situation that the generated data of the generator and the real data of the training set are easily distinguished due to the fact that the discrimination capability of the discriminator is too strong, so that gradient disappearance cannot be trained is caused, in order to solve the problem that the training of the traditional GAN network is difficult, step 4 applies noise which is reduced with time to a loss function of the discriminator end in the training process of the GAN network, so that a certain intersection exists between the training data and the generated data, the distribution of the generated data of the generator gradually approaches to the real data distribution along with the increase of training time, and the condition that the network has a certain training gradient can be ensured by gradually reducing the noise; thus, improved countering loss L GAN Expressed as:
L GAN =E x [logh(D(x))]+E z [log(1-h(D(G(z))))]
wherein h (·) represents a noise function that decreases over time;
to encourage the network to generate multiple social-compliant trajectories, the network samples k predicted trajectories at a time and selects the trajectory with the smallest position offset error for calculating the position offset loss, thus the position offset loss L of the network 2 Expressed as:
wherein Y is i Andrespectively representing the real track and the predicted track of the pedestrian i;
thus, the loss function of the network as a whole is expressed as:
L total =L GAN +lL 2
wherein l is a superparameter.
Further, the step 5 of sending the observed track of the pedestrian to the generator to obtain the predicted track coordinates of the pedestrian, includes:
step 1, step 2 and step 3 are sequentially executed, namely the observed track of the pedestrian is sent into an encoder to be encoded so as to obtain the hidden feature of the pedestrian motion, the interaction information of the pedestrian is extracted through an attention pooling module, and finally the predicted track coordinates of the pedestrian are output through a decoder.
Compared with the prior art, the technical scheme provided by the application has the following beneficial effects:
1. aiming at the defect that the existing method cannot fully extract the interaction information between pedestrians, by introducing an attention pooling module, elements such as the movement direction, the speed and the like of the pedestrians are related with future tracks of the pedestrians, and the influence weight distribution is carried out on the pedestrians in the same scene, so that the interaction information between the pedestrians is extracted more effectively, and meanwhile, the interpretability of the model is improved.
2. Aiming at the problem that training gradient disappears and training is difficult due to mismatching of the strength of a generator and the strength of a discriminator in the training process of the generating type countermeasure network, noise which is reduced along with time is introduced into the discriminator in the training process by modifying the loss function, the training effect of a model is improved, and the prediction accuracy of a track is improved.
Drawings
FIG. 1 is a schematic diagram of the workflow of the present application;
FIG. 2 is an overall block diagram of a network model;
FIG. 3 is a schematic diagram of an attention pooling module;
fig. 4 is a schematic diagram of a GAN network training process;
FIG. 5 is a visual comparison of predicted trajectories.
Detailed Description
The application is described in further detail below with reference to the attached drawings and detailed description:
the application provides a generating type countermeasure track prediction method based on an attention mechanism, which is used for fully extracting interaction information between pedestrians so as to improve the track prediction precision. If the method is used in a navigation planning system of the service robot, the service robot can plan a more rational and effective path in a dynamic environment which is blended with people, so that the navigation comfort is improved.
As shown in fig. 1 and 2, the overall structure diagram of the network model of the present application mainly includes a generator module and a discriminator module. The generator module is based on an encoder-decoder architecture and comprises an encoder, an attention pooling module and a decoder, the generator receives the historical track of the pedestrian, the track of the pedestrian is encoded by the encoder to obtain the hidden characteristic of the pedestrian, the pooling module combined with the attention mechanism is used for extracting the interaction information of the pedestrian, and finally the decoder module is used for outputting the pedestrian position coordinates predicted by the network. The discriminator module mainly consists of an encoder module which accepts track inputs and encodes the tracks by an encoder, and then scores the true extent of the tracks by a classification network.
The method provided by the application specifically comprises the following steps:
step 1: preprocessing pedestrian track data and sending the pedestrian track data to an encoder for encoding;
the network receives the historical track of the pedestrian and takes the full-connection network with a single layer as an embedded layer to change the position information of the pedestrian i at the moment tIs converted into a feature vector of a fixed length +.>Then the vector is sent into an LSTM network for coding processing, the time sequence characteristics of track data are learned, and the hidden state of the pedestrian i at the moment t is obtained>
Wherein f (·) is an embedded layer employing a ReLU activation function, W f And W is encoder The weight parameters of the embedded layer and the LSTM network, respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene.
Step 2: sending the encoded vector to a pooling module based on an attention mechanism for influence weight distribution and obtaining a pooled vector;
the future trajectory of pedestrians is always influenced by the pedestrians in front and is related to the speed, direction of movement, relative distance, etc. of these pedestrians. As shown in fig. 3, the future trajectory of the target pedestrian 1 is mainly affected by the pedestrians 2 and 3 in front of the line of sight, which is hardly affected by the pedestrian 4. And the greater the speed of the pedestrian 2, the smaller the relative distance from the pedestrian 1, the greater its influence on the trajectory of the pedestrian 1.
In order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode ij And the velocity vector v of the pedestrian j j Distance vector d between pedestrian i and pedestrian j ij Velocity vector v of pedestrian i i Distance vector d to pedestrian j ij Included angle a of (a) ij Cosine value cosa ij Velocity vector v of pedestrian i i Velocity vector v with pedestrian j j Included angle b of (2) ij Cosine value cosb ij Merging into feature vector q ij A multi-layer fully connected network using the softmax function as an activation function is fed to obtain the attention weight of pedestrian j to the target pedestrian i in the scene.
Then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H i And combining the attention weights of different pedestrians into a weight matrix W atten_i . Finally, weight matrix W atten_i Pooled vector H with pooling i Multiplication to obtain a feature vector p hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode i . Intuitively understand, by taking attentionThe mechanism obtains the weight of the influence of all people in the scene on the future track of the target pedestrian i, so as to summarize the information p required by the target pedestrian i to make a decision i Therefore, the purpose of pedestrian interaction modeling is achieved. The specific formula calculation is as follows:
q ij =[v j ,d ij ,cosa ij ,cosb ij ]
q i =[q i1 ,q i2 ,...,q ij ,...,q iN ]
W atten_i =s(q i ;W s )
H i =[h i1 ,h i2 ,...,h ij ,...,h iN ]
p i =maxpool(W atten_i H i )
wherein s (·) represents a multi-layer fully connected network using a softmax activation function, W s Is a weight parameter for the network.
Step 3: outputting a predicted trajectory of the pedestrian using a decoder based on the LSTM network;
pooling vector for outputting attention pooling moduleHidden layer vector outputted by encoder module>And random noise z satisfying Gaussian distribution is combined into a feature vector as an initial input of the decoder +.>The decoder firstly converts the position change of the pedestrian at the latest moment into a feature space through a fully connected network to obtain a feature vector +.>Then obtaining the current hidden state through LSTM network>Finally, converting the coordinate space into a predicted track coordinate through a fully connected networkThe overall calculation formula of the decoder is as follows:
wherein j (·), m (·) and g (·) are all fully connected networks with a ReLU activation function, W j 、W m And W is g The weight parameters of the three networks, W decoder Is a weight parameter of the LSTM network.
Step 4: performing countermeasure training on the generator and the discriminator by using an Adam algorithm by utilizing the improved loss function;
the improved loss function mainly comprises two parts, one part is the counterloss L of the GAN network GAN Another part is the loss of position shift L between the real track and the predicted track 2
Let the distribution represented by the real training data x be p data (i.e. x-p data (x) The generator samples z (i.e., z-p (z)) from the a priori noise distribution p and the GAN network training process is essentially such that the data distribution represented by the generator's output G (z) is as close as possible to the true training set data distribution. Training loss function L of traditional GAN network tran_GAN Can be expressed as:
L tran_GAN =E x [logD(x)]+E z [log(1-D(G(z)))]
however, the traditional GAN network is very easy to have the situation that the discrimination capability of the discriminator is too strong in the training process, so that the generated data of the generator and the real data of the training set can be easily distinguished, and the gradient vanishes and cannot be trained.
In order to solve the problem of difficulty in training the conventional GAN network, step 4 applies noise which decreases with time to the loss function of the arbiter during training the GAN network, as shown in fig. 4, in which the dark solid line represents the training set data distribution p data (x) The light solid line represents the generator generated data distribution p G (z). In the early stage of network training, the intersection of the two distributions is small, so that the discriminator can easily distinguish real data from generated data, and the network lacks training gradient. Therefore, a certain noise is added at the discriminator end in the initial training stage so that a certain intersection exists between training data and generated data. With the increase of training time, the distribution of the data generated by the generator gradually approaches to the real data distribution, and the gradual reduction of noise can still ensure that the network has a certain training gradient. Thus, the fight loss function L proposed herein GAN Can be expressed as:
L GAN =E x [logh(D(x))]+E z [log(1-h(D(G(z))))]
where h (·) represents a decreasing noise function over time.
To encourage the network to generate multiple social-compliant trajectories, the network samples k predicted trajectories at a time and selects the trajectory with the smallest position offset error for calculating the position offset loss, thus the position offset loss L of the network 2 Can be expressed as:
wherein Y is i Andthe real track and the predicted track of the pedestrian i are respectively represented.
Thus, the loss function of the network as a whole can be expressed as:
L total =L GAN +lL 2
wherein l is a superparameter.
Step 5: sending the observation track of the pedestrian into a trained generator of the network model to obtain predicted track coordinates of the pedestrian;
the method comprises the steps of step 1, step 2 and step 3, namely, the observed track of the pedestrian is sent into an encoder to be encoded so as to obtain the hidden characteristic of the pedestrian motion, the interaction information of the pedestrian is extracted through an attention pooling module, and finally, the predicted track coordinates of the pedestrian are output through a decoder.
Fig. 5 illustrates three representative pedestrian trajectory prediction scenarios. In each scene, the left sub-graph represents a real motion track of a pedestrian, the right sub-graph represents an observation track and a prediction track of the pedestrian, and the solid circles and the stars represent the observation track and the prediction track respectively. It can be seen that the method provided by the application can capture complex interactions among pedestrians, such as accompanying, mutual gifts and the like, the predicted track is more in accordance with the actual motion scene, and the track predicted by the network does not conflict with other tracks. Therefore, the predicted track output by the network model provided by the application meets the social specification and meets the physical constraint.
TABLE 1 ADE and FDE comparison of different models (t pred =8/12)
The present application uses the following two indicators to characterize the accuracy of the predicted trajectory.
1) Average offset error (Average Displacement Error, ADE). The average value of euclidean distance of the predicted trajectory from the real trajectory sequence at each time step is represented.
2) Final offset error (Final Displacement Error, FDE). And the Euclidean distance between the predicted track and the real track sequence at the final moment is represented.
The application selects the most representative Linear, LSTM, S-LSTM and SGAN network models as the comparison standard, and the comparison results of various track prediction models are shown in table 1. Wherein, the data units in the table are meters, the thickened data represent the best result, the Atten-GAN is the network model corresponding to the application, the +DN represents that the Atten-GAN introduces noise which decreases with time in the training process, and the-DN is opposite.
The comprehensive table data can show that the application can selectively fuse information influencing the future track of the target pedestrian due to the introduction of the attention pooling mechanism, so that the model has stronger expressive force and can more accurately describe the interaction of the pedestrian. Meanwhile, noise which is reduced along with time is added in the discriminator in the training process, so that the problem of gradient disappearance caused by unbalanced strength of the generator and the discriminator can be improved to a certain extent, and the prediction accuracy of the network is further improved.
The above description is only of the preferred embodiment of the present application, and is not intended to limit the present application in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present application, which fall within the scope of the present application as defined by the appended claims.

Claims (1)

1. The generating type countermeasure track prediction method based on the attention mechanism is characterized by comprising the following steps:
step 1: preprocessing pedestrian track data and sending the pedestrian track data into an encoder for encoding processing;
the encoding process for the pedestrian track in the step 1 includes:
the network receives the historical track of the pedestrian and takes the full-connection network with a single layer as an embedded layer to change the position information of the pedestrian i at the moment tIs converted into a feature vector of a fixed length +.>Then the vector is sent into an LSTM network for coding processing, the time sequence characteristics of track data are learned, and the hidden state of the pedestrian i at the moment t is obtained>
Wherein phi (·) is an embedded layer employing a ReLU activation function, W φ And W is encoder The weight parameters of the embedded layer and the LSTM network are respectively, and the parameters of the LSTM network are shared by all pedestrians in the scene;
step 2: sending the encoded vector to a pooling module based on an attention mechanism for influence weight distribution and obtaining a pooled vector;
the step 2 of performing influence weight distribution on pedestrians in the same scene through a pooling module based on an attention mechanism, and outputting a pooling vector representing pedestrian interaction information, includes:
in order to describe the influence of pedestrian j on the motion of target pedestrian i, the module first obtains a pooling vector h by using a pooling mode ij And the velocity vector v of the pedestrian j j Distance vector d between pedestrian i and pedestrian j ij Velocity vector v of pedestrian i i Distance vector d to pedestrian j ij Included angle a of (a) ij Cosine value cosa ij Velocity vector v of pedestrian i i Velocity vector v with pedestrian j j Included angle beta of (2) ij Cosine value cos beta ij Merging into feature vector θ ij Delivering a function using softmax as activationThe multi-layer full-connection network of the function, so that the attention weight of the pedestrian j to the target pedestrian i in the scene is obtained;
then, the pooling vectors of all other pedestrians in the scene relative to the target pedestrian i are converged into a final pooling vector H i And combining the attention weights of different pedestrians into a weight matrix W atten_i Finally, the weight matrix W atten_i Pooled vector H with pooling i Multiplication to obtain a feature vector p hi And obtaining a pooling vector p of the target pedestrian i by a maximum pooling mode i The pooling vector characterizes information required by the target pedestrian i to make a decision, intuitively understands that the weight of all people in the scene on the future track influence of the target pedestrian i is obtained by adopting an attention mechanism, and thus the information p required by the target pedestrian i to make the decision is summarized i Therefore, the purpose of pedestrian interaction modeling is achieved, and a specific formula is calculated as follows:
θ ij =[v j ,d ij ,cosa ij ,cosβ ij ]
θ i =[θ i1i2 ,...,θ ij ,...,θ iN ]
W atten_i =σ(θ i ;W s )
H i =[h i1 ,h i2 ,...,h ij ,...,h iN ]
p i =maxpool(W atten_i H i )
wherein σ (·) represents a multi-layer fully connected network using a softmax activation function, W s A weight parameter for the network;
step 3: outputting a predicted trajectory of the pedestrian using a decoder based on the LSTM network;
the outputting the predicted track of the pedestrian by using a decoder based on the LSTM network in step 3 includes:
pooling vector for outputting attention pooling moduleEncoder module inputThe hidden layer vector +.>And random noise z satisfying Gaussian distribution is combined into a feature vector as an initial input of the decoder +.>The decoder firstly converts the position change of the pedestrian at the latest moment into a feature space through a fully connected network to obtain a feature vector +.>Then obtaining the current hidden state through LSTM network>Finally, converting the coordinate space into a predicted track coordinate through a fully connected networkThe overall calculation formula of the decoder is as follows:
wherein,mu (-) and gamma (-) are all fully connected networks with ReLU activation function,/-, and>W μ and W is γ The weight parameters of the three networks, W decoder The weight parameters of the LSTM network;
step 4: performing countermeasure training on the generator and the discriminator by using an Adam algorithm by utilizing the improved loss function;
the reverse training of the generator and the arbiter with the improved loss function described in step 4 comprises:
reverse training of networks using Adam algorithm with improved loss function, which mainly consists of two parts, one part being the counterloss L of GAN network GAN Another part is the loss of position shift L between the real track and the predicted track 2
Let the distribution represented by the real training data x be p data I.e. x-p data (x) The generator samples z from the prior noise distribution p, i.e. z-p (z), and the training process of the GAN network essentially makes the data distribution represented by the output G (z) of the generator as close as possible to the real training set data distribution, the training loss function L of the conventional GAN network tran_GAN Expressed as:
L tran_GAN =E x [logD(x)]+E z [log(1-D(G(z)))]
however, in order to solve the problem that the conventional GAN network is difficult to train, step 4 applies noise which is reduced with time to the loss function of the discriminator end in the training process to the GAN network, so that a certain intersection exists between the training data and the generated data, and the distribution of the generated data of the generator gradually approaches to the distribution of the real data along with the increase of training timeAt the moment, the noise is gradually reduced, so that the network can still be ensured to have a certain training gradient; thus, improved countering loss L GAN Expressed as:
L GAN =E x [logη(D(x))]+E z [log(1-η(D(G(z))))]
where η (·) represents a noise function that decreases over time;
to encourage the network to generate multiple social-compliant trajectories, the network samples k predicted trajectories at a time and selects the trajectory with the smallest position offset error for calculating the position offset loss, thus the position offset loss L of the network 2 Expressed as:
wherein Y is i Andrespectively representing the real track and the predicted track of the pedestrian i;
thus, the loss function of the network as a whole is expressed as:
L total =L GAN +λL 2
wherein lambda is a super parameter;
step 5: sending the observation track of the pedestrian into a trained generator of the network model to obtain predicted track coordinates of the pedestrian;
and 5, sending the observed track of the pedestrian into a generator to obtain predicted track coordinates of the pedestrian, wherein the method comprises the following steps:
step 1, step 2 and step 3 are sequentially executed, namely the observed track of the pedestrian is sent into an encoder to be encoded so as to obtain the hidden feature of the pedestrian motion, the interaction information of the pedestrian is extracted through an attention pooling module, and finally the predicted track coordinates of the pedestrian are output through a decoder.
CN202110053547.8A 2021-01-15 2021-01-15 Attention mechanism-based generation type countermeasure track prediction method Active CN112766561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110053547.8A CN112766561B (en) 2021-01-15 2021-01-15 Attention mechanism-based generation type countermeasure track prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110053547.8A CN112766561B (en) 2021-01-15 2021-01-15 Attention mechanism-based generation type countermeasure track prediction method

Publications (2)

Publication Number Publication Date
CN112766561A CN112766561A (en) 2021-05-07
CN112766561B true CN112766561B (en) 2023-11-17

Family

ID=75701709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110053547.8A Active CN112766561B (en) 2021-01-15 2021-01-15 Attention mechanism-based generation type countermeasure track prediction method

Country Status (1)

Country Link
CN (1) CN112766561B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256681B (en) * 2021-05-26 2022-05-13 北京易航远智科技有限公司 Pedestrian trajectory prediction method based on space-time attention mechanism
CN113269115B (en) * 2021-06-04 2024-02-09 北京易航远智科技有限公司 Pedestrian track prediction method based on Informar
CN113269114B (en) * 2021-06-04 2024-02-02 北京易航远智科技有限公司 Pedestrian track prediction method based on multiple hidden variable predictors and key points
CN113537297B (en) * 2021-06-22 2023-07-28 同盾科技有限公司 Behavior data prediction method and device
CN113627249B (en) * 2021-07-05 2023-04-28 中山大学·深圳 Navigation system training method and device based on contrast learning and navigation system
CN113538506A (en) * 2021-07-23 2021-10-22 陕西师范大学 Pedestrian trajectory prediction method based on global dynamic scene information depth modeling
CN114581487B (en) * 2021-08-02 2022-11-25 北京易航远智科技有限公司 Pedestrian trajectory prediction method, device, electronic equipment and computer program product
CN113869170B (en) * 2021-09-22 2024-04-23 武汉大学 Pedestrian track prediction method based on graph division convolutional neural network
CN114757975B (en) * 2022-04-29 2024-04-16 华南理工大学 Pedestrian track prediction method based on transformer and graph convolution network
CN116069879B (en) * 2022-11-14 2023-06-20 成都信息工程大学 Method, device, equipment and storage medium for predicting pedestrian track
CN116663753B (en) * 2023-08-01 2023-10-20 江西省供销江南物联网有限公司 Cold chain food distribution prediction method and system
CN117273225B (en) * 2023-09-26 2024-05-03 西安理工大学 Pedestrian path prediction method based on space-time characteristics
CN117332048B (en) * 2023-11-30 2024-03-22 运易通科技有限公司 Logistics information query method, device and system based on machine learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781838A (en) * 2019-10-28 2020-02-11 大连海事大学 Multi-modal trajectory prediction method for pedestrian in complex scene
CN111339867A (en) * 2020-02-18 2020-06-26 广东工业大学 Pedestrian trajectory prediction method based on generation of countermeasure network
CN111428763A (en) * 2020-03-17 2020-07-17 陕西师范大学 Pedestrian trajectory prediction method based on scene constraint GAN
CN111461437A (en) * 2020-04-01 2020-07-28 北京工业大学 Data-driven crowd movement simulation method based on generation of confrontation network
CN111661045A (en) * 2019-03-05 2020-09-15 宝马股份公司 Training a generator unit and a discriminator unit for a trajectory prediction for detecting a collision
CN111930110A (en) * 2020-06-01 2020-11-13 西安理工大学 Intent track prediction method for generating confrontation network by combining society

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111661045A (en) * 2019-03-05 2020-09-15 宝马股份公司 Training a generator unit and a discriminator unit for a trajectory prediction for detecting a collision
CN110781838A (en) * 2019-10-28 2020-02-11 大连海事大学 Multi-modal trajectory prediction method for pedestrian in complex scene
CN111339867A (en) * 2020-02-18 2020-06-26 广东工业大学 Pedestrian trajectory prediction method based on generation of countermeasure network
CN111428763A (en) * 2020-03-17 2020-07-17 陕西师范大学 Pedestrian trajectory prediction method based on scene constraint GAN
CN111461437A (en) * 2020-04-01 2020-07-28 北京工业大学 Data-driven crowd movement simulation method based on generation of confrontation network
CN111930110A (en) * 2020-06-01 2020-11-13 西安理工大学 Intent track prediction method for generating confrontation network by combining society

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于GAN和注意力机制的行人轨迹预测;欧阳俊等;激光与光电子学进展;1-12 *
基于社会注意力机制的行人轨迹预测方法研究;李琳辉等;通信学报;175-183 *

Also Published As

Publication number Publication date
CN112766561A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN112766561B (en) Attention mechanism-based generation type countermeasure track prediction method
Sadeghian et al. Sophie: An attentive gan for predicting paths compliant to social and physical constraints
US10963738B2 (en) Method for processing input on basis of neural network learning and apparatus therefor
Zhao et al. A spatial-temporal attention model for human trajectory prediction.
Grigorescu et al. Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles
Tagliaferri et al. Wind direction forecasting with artificial neural networks and support vector machines
Fernando et al. Going deeper: Autonomous steering with neural memory networks
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
CN111931902A (en) Countermeasure network generation model and vehicle track prediction method using the same
CN112347923A (en) Roadside end pedestrian track prediction algorithm based on confrontation generation network
Shen et al. A hybrid forecasting model for the velocity of hybrid robotic fish based on back-propagation neural network with genetic algorithm optimization
CN115829171B (en) Pedestrian track prediction method combining space-time information and social interaction characteristics
CN112651374B (en) Future trajectory prediction method based on social information and automatic driving system
CN111027627A (en) Vibration information terrain classification and identification method based on multilayer perceptron
Krajewski et al. Béziervae: Improved trajectory modeling using variational autoencoders for the safety validation of highly automated vehicles
CN114152257A (en) Ship prediction navigation method based on attention mechanism and environment perception LSTM
Yang et al. TPPO: a novel trajectory predictor with pseudo oracle
Mirus et al. An investigation of vehicle behavior prediction using a vector power representation to encode spatial positions of multiple objects and neural networks
Praczyk Using evolutionary neural networks to predict spatial orientation of a ship
US11948079B2 (en) Multi-agent coordination method and apparatus
CN117408372A (en) Ship track prediction method and system based on deep learning
Zhou et al. SA-SGAN: A Vehicle Trajectory Prediction Model Based on Generative Adversarial Networks
Janjoš et al. Bridging the Gap Between Multi-Step and One-Shot Trajectory Prediction via Self-Supervision
Zhang et al. Learning to discover task-relevant features for interpretable reinforcement learning
Yu et al. RIRL: A recurrent imitation and reinforcement learning method for long-horizon robotic tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant