CN112347923A - Roadside end pedestrian track prediction algorithm based on confrontation generation network - Google Patents
Roadside end pedestrian track prediction algorithm based on confrontation generation network Download PDFInfo
- Publication number
- CN112347923A CN112347923A CN202011229272.0A CN202011229272A CN112347923A CN 112347923 A CN112347923 A CN 112347923A CN 202011229272 A CN202011229272 A CN 202011229272A CN 112347923 A CN112347923 A CN 112347923A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- track
- latent variable
- generator
- trajectory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000009826 distribution Methods 0.000 claims abstract description 52
- 230000007246 mechanism Effects 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 28
- 230000001133 acceleration Effects 0.000 claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 27
- 230000006403 short-term memory Effects 0.000 claims description 15
- 230000007787 long-term memory Effects 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 16
- 230000006399 behavior Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 230000003997 social interaction Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Development Economics (AREA)
- Evolutionary Biology (AREA)
- Game Theory and Decision Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a roadside end pedestrian track generation algorithm based on an confrontation generation network, which generates a multi-mode predicted track by utilizing a social attention mechanism and a pedestrian track latent variable; through the confrontation generation training of the track generator and the discriminator, the capabilities of the generator and the discriminator are continuously optimized, and the precision of the track generated by the generator is improved; the method comprises the steps that a social attention mechanism based on head orientation is provided, the head orientation of a pedestrian is obtained through the last speed direction of the pedestrian, a cosine value of an included angle between the pedestrians is calculated according to head orientation information, soft attention and hard attention mechanisms optimize the output of the social attention mechanism by using the calculated angle information, and the output is converged through a maximum pooling layer; a new latent variable generating method is proposed, two feedforward neural networks are used for learning latent variables from pedestrian historical tracks and observation tracks respectively, the input of the latent variable generator comprises position, speed and acceleration, and the distribution of three types of latent variables is generated from the three types of input respectively.
Description
Technical Field
The invention relates to the technical field of automatic driving, in particular to pedestrian trajectory prediction, and provides a roadside end pedestrian trajectory prediction algorithm based on a confrontation generation network.
Background
With the continuous development of the automatic navigation of the robot and the automatic driving technology of the automobile, the unmanned technology gets wide attention and has bright application prospect; the unmanned vehicle can bring convenience to the life of people, but the unmanned vehicle needs to monitor the motion trail of pedestrians on a road and predict the future motion trail of the pedestrians during driving, so that collision with the pedestrians is avoided; in order to better predict the motion trail of the pedestrian, the unmanned vehicle needs to process the observed pedestrian trail data, learn the rule of the motion of the pedestrian and predict the next motion state of the pedestrian according to the rule; the challenge of accurately predicting the pedestrian motion trajectory comes from the complexity of human behavior and its own intentions and variety of external stimuli; pedestrian motion behavior may be driven by its own target intent, the existence of action interactions between surrounding objects, social relationships, social rules and norms, or its topological, geometric and semantic environment, most of which are not directly visible, need to be inferred from complex laws of motion, or modeled from contextual information; how to let the unmanned vehicle learn the potential motion law is the key for accurately predicting the pedestrian track;
due to the fact that behaviors of pedestrians are random, whether the pedestrians are machines or humans, future tracks of the pedestrians cannot be predicted accurately; the pedestrian's trajectory is influenced by the surrounding environment, such as person-to-person, person-to-object, which is potentially undescribable; however, the future track of the pedestrian is always influenced by the motion of people and objects in front of the pedestrian, and the common knowledge is utilized to be beneficial to simulating the social interaction behavior of the pedestrian, so that the future motion track of the pedestrian is well predicted;
the motion modes of pedestrians are complex and diverse, the complex pedestrian motion is difficult to describe by a dynamic model, and a common method for modeling the general motion of a maneuvering target is to define and fuse different typical motion modes, each mode is described by different dynamic states; the patterns may be linear movements, turning maneuvers or sudden accelerations, forming over time a sequence capable of describing complex movement behaviour; the diversity of pedestrian motion patterns in pedestrian trajectory prediction must also be considered;
disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to solve the problems that the motion modes of pedestrians are complex and various and the complex pedestrian motion is difficult to describe by a dynamic model in the prior art, a roadside end pedestrian track prediction algorithm based on a confrontation generation network is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows: a roadside end pedestrian trajectory prediction algorithm based on a confrontation generation network comprises the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
Further, in step S31: the latent variable predictor consists of two feedforward neural networks defined as follows:
wherein Ψ (-) andis a feed-forward neural network that is,andare the parameters of the two feedforward neural networks respectively,andis the k-th type input of the latent variable predictor.
Further, in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
whereinAndrespectively representing the latent variable distribution of the observed track and the latent variable distribution of the real track.
Further, the step 1 specifically includes the following steps:
s11: processing input track data: the input trace being a series of time-series trace pointsWhereinIs the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: converting two-dimensional position information into multi-dimensional vector of fixed length by using single-layer multi-layer perceptronThe definition of the multi-layer perceptron is as follows:
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movementThe encoder long short term memory network (LSTM) is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
Further, the step 2 specifically includes the following steps:
s21: calculating the azimuth angle between the pedestrians: taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijAre all set to 0 or 1, when the row isWhen the cosine value of the azimuth included angle between the people is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
Further, the step 4 specifically includes the following steps:
s41: output of social attention moduleAnd the output of latent variable predictorHidden from pedestrian movementMake a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various informationThe long-short term memory network of the decoder is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinatesWherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
whereinIs the real track of the pedestrian,is the m-th generationThe predicted future trajectory of the pedestrian is set to m-20 in the present invention.
Further, the step 5 specifically includes the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
The invention has the beneficial effects that the roadside end pedestrian track prediction algorithm based on the confrontation generation network (1) provides a social attention module, the module utilizes the correlation between the head orientation and the track prediction, and the attention mechanism improves the social interaction capturing capability of the social pooling layer under different scenes;
(2) a new latent variable predictor is provided, which can estimate the latent variable with rich knowledge to better predict the track; only the input of the prediction variable is extracted from the trajectory data, thus only little computational overhead is added;
(3) embedding the social attention focusing module and the latent variable predictive variable into an confrontation generating network framework to generate multi-mode output acceptable by social rules;
drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of a challenge generation training strategy proposed in the present invention;
FIG. 2 is a schematic diagram of a generator proposed in the present invention
FIG. 3 is a schematic representation of latent variable prediction proposed in the present invention;
FIG. 4 is a schematic diagram of the discriminator proposed in the present invention;
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
1-4, the roadside end pedestrian trajectory prediction algorithm based on the confrontation generation network includes the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step 1 specifically comprises the following steps:
s11: processing input track data: the input trace being a series of time-series trace pointsWhereinIs the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: converting two-dimensional position information into multi-dimensional vector of fixed length by using single-layer multi-layer perceptronThe definition of the multi-layer perceptron is as follows:
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movementThe encoder long short term memory network (LSTM) is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
Further, the step 2 specifically includes the following steps:
s21: calculating the azimuth angle between the pedestrians: the invention is based on the fact that the future trajectory of a pedestrian is always influenced by the front crowd and not by the rear crowd; taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijIs set to be 0 or 1, when the cosine value of the azimuth angle between the pedestrians is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
The step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
In step S31: the invention applies a latent variable predictor to generate a predictable latent variable distribution, which is a method for predicting latent variable distribution parameters in a data-driven manner; potential variable distribution parameters can be predicted from the observation track and the real track of the pedestrian in a training stage by a potential variable generator, so that a potential motion rule can be learned; the latent variable predictor consists of two feedforward neural networks defined as follows:
wherein Ψ (-) andis a feed-forward neural network that is,andare the parameters of the two feedforward neural networks respectively,andis the k-th type input of the latent variable predictor.
Further, in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
whereinAndrespectively representing the latent variable distribution of the observed track and the latent variable distribution of the real track.
Further, the step 4 specifically includes the following steps:
s41: the interaction between pedestrians is obtained by the social attention module, the pedestrian motion latent variable distribution is obtained by the latent variable predictor, and the output of the social attention module is outputAnd the output of latent variable predictorHidden from pedestrian movementMake a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various informationThe long-short term memory network of the decoder is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinatesWherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
whereinIs the real track of the pedestrian,is the predicted future trajectory of the pedestrian by the mth generator, and m is set to 20 in the present invention.
Further, the step 5 specifically includes the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
The invention provides a roadside end pedestrian track generation algorithm based on an confrontation generation network, which generates a multi-mode predicted track by utilizing a social attention mechanism and a pedestrian track latent variable; according to the method, the capabilities of the generator and the discriminator are continuously optimized through the confrontation generation training of the trajectory generator and the discriminator, and the accuracy of the trajectory generated by the generator is improved; the invention provides a social attention mechanism based on head orientation, which obtains the head orientation of a pedestrian through the last speed direction of the pedestrian, calculates the cosine value of an included angle between the pedestrians according to the head orientation information, optimizes the output of the social attention mechanism by using the calculated angle information and converges the output through a maximum pooling layer; the invention provides a new latent variable generation method, which is characterized in that two feedforward neural networks are used for learning latent variables from pedestrian historical tracks and observation tracks respectively, the input of a latent variable generator comprises position, speed and acceleration, and the distribution of three types of latent variables is generated from the three types of input respectively; the three types of latent variable distributions are combined with Gaussian random noise to generate multi-modal output and maintain the capability of processing uncertain input in the future.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (7)
1. A roadside end pedestrian trajectory prediction algorithm based on an confrontation generation network is characterized in that: the method comprises the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
2. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: in step S31: the latent variable predictor consists of two feedforward neural networks defined as follows:
3. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 2, characterized by: in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
4. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 1 specifically comprises the following steps:
s11: processing input track data: the input trace being a series of time-series trace pointsWhereinIs the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: using a single-layered multi-layered perceptronTwo-dimensional position information is converted into multi-dimensional vector with fixed lengthThe definition of the multi-layer perceptron is as follows:
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movementThe encoder long short term memory network (LSTM) is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
5. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 2 specifically comprises the following steps:
s21: calculating the azimuth angle between the pedestrians: taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijIs set to be 0 or 1, when the cosine value of the azimuth angle between the pedestrians is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
6. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 4 specifically comprises the following steps:
s41: output of social attention moduleAnd the output of latent variable predictorHidden from pedestrian movementMake a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various informationThe long-short term memory network of the decoder is defined as follows:
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinatesWherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
7. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 5 specifically comprises the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011229272.0A CN112347923A (en) | 2020-11-06 | 2020-11-06 | Roadside end pedestrian track prediction algorithm based on confrontation generation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011229272.0A CN112347923A (en) | 2020-11-06 | 2020-11-06 | Roadside end pedestrian track prediction algorithm based on confrontation generation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112347923A true CN112347923A (en) | 2021-02-09 |
Family
ID=74428364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011229272.0A Withdrawn CN112347923A (en) | 2020-11-06 | 2020-11-06 | Roadside end pedestrian track prediction algorithm based on confrontation generation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347923A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156957A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Autonomous mobile robot self-supervision learning and navigation method based on confrontation generation network |
CN113473682A (en) * | 2021-09-01 | 2021-10-01 | 启东晶尧光电科技有限公司 | Artificial intelligence-based smart city lighting street lamp adjusting method and system |
CN113538506A (en) * | 2021-07-23 | 2021-10-22 | 陕西师范大学 | Pedestrian trajectory prediction method based on global dynamic scene information depth modeling |
CN113689470A (en) * | 2021-09-02 | 2021-11-23 | 重庆大学 | Pedestrian motion trajectory prediction method under multi-scene fusion |
CN114757975A (en) * | 2022-04-29 | 2022-07-15 | 华南理工大学 | Pedestrian trajectory prediction method based on transformer and graph convolution network |
CN114898550A (en) * | 2022-03-16 | 2022-08-12 | 清华大学 | Pedestrian trajectory prediction method and system |
-
2020
- 2020-11-06 CN CN202011229272.0A patent/CN112347923A/en not_active Withdrawn
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156957A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Autonomous mobile robot self-supervision learning and navigation method based on confrontation generation network |
CN113538506A (en) * | 2021-07-23 | 2021-10-22 | 陕西师范大学 | Pedestrian trajectory prediction method based on global dynamic scene information depth modeling |
CN113473682A (en) * | 2021-09-01 | 2021-10-01 | 启东晶尧光电科技有限公司 | Artificial intelligence-based smart city lighting street lamp adjusting method and system |
CN113473682B (en) * | 2021-09-01 | 2021-11-26 | 启东晶尧光电科技有限公司 | Artificial intelligence-based smart city lighting street lamp adjusting method and system |
CN113689470A (en) * | 2021-09-02 | 2021-11-23 | 重庆大学 | Pedestrian motion trajectory prediction method under multi-scene fusion |
CN113689470B (en) * | 2021-09-02 | 2023-08-11 | 重庆大学 | Pedestrian motion trail prediction method under multi-scene fusion |
CN114898550A (en) * | 2022-03-16 | 2022-08-12 | 清华大学 | Pedestrian trajectory prediction method and system |
CN114898550B (en) * | 2022-03-16 | 2024-03-19 | 清华大学 | Pedestrian track prediction method and system |
CN114757975A (en) * | 2022-04-29 | 2022-07-15 | 华南理工大学 | Pedestrian trajectory prediction method based on transformer and graph convolution network |
CN114757975B (en) * | 2022-04-29 | 2024-04-16 | 华南理工大学 | Pedestrian track prediction method based on transformer and graph convolution network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112347923A (en) | Roadside end pedestrian track prediction algorithm based on confrontation generation network | |
CN112119409B (en) | Neural network with relational memory | |
Zhao et al. | A spatial-temporal attention model for human trajectory prediction. | |
Kim et al. | Multi-head attention based probabilistic vehicle trajectory prediction | |
CN112734808B (en) | Trajectory prediction method for vulnerable road users in vehicle driving environment | |
Cho et al. | Deep predictive autonomous driving using multi-agent joint trajectory prediction and traffic rules | |
CN111339867A (en) | Pedestrian trajectory prediction method based on generation of countermeasure network | |
Yang et al. | A novel graph-based trajectory predictor with pseudo-oracle | |
US20230419113A1 (en) | Attention-based deep reinforcement learning for autonomous agents | |
Ye et al. | GSAN: Graph self-attention network for learning spatial–temporal interaction representation in autonomous driving | |
CN117077727B (en) | Track prediction method based on space-time attention mechanism and neural ordinary differential equation | |
CN115829171A (en) | Pedestrian trajectory prediction method combining space information and social interaction characteristics | |
Kuo et al. | Trajectory prediction with linguistic representations | |
CN115659275A (en) | Real-time accurate trajectory prediction method and system in unstructured human-computer interaction environment | |
Yu et al. | Hybrid attention-oriented experience replay for deep reinforcement learning and its application to a multi-robot cooperative hunting problem | |
CN112418421B (en) | Road side end pedestrian track prediction algorithm based on graph attention self-coding model | |
Mirus et al. | An investigation of vehicle behavior prediction using a vector power representation to encode spatial positions of multiple objects and neural networks | |
Takano et al. | Prediction of human behaviors in the future through symbolic inference | |
KR102234917B1 (en) | Data processing apparatus through neural network learning, data processing method through the neural network learning, and recording medium recording the method | |
Shukla et al. | UBOL: User-Behavior-aware one-shot learning for safe autonomous driving | |
Zhang et al. | Learning to discover task-relevant features for interpretable reinforcement learning | |
Dan | Spatial-temporal block and LSTM network for pedestrian trajectories prediction | |
Karle et al. | Mixnet: Physics constrained deep neural motion prediction for autonomous racing | |
Wu et al. | A novel trajectory generator based on a constrained GAN and a latent variables predictor | |
Takano et al. | What do you expect from a robot that tells your future? The crystal ball |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210209 |