CN112347923A - Roadside end pedestrian track prediction algorithm based on confrontation generation network - Google Patents

Roadside end pedestrian track prediction algorithm based on confrontation generation network Download PDF

Info

Publication number
CN112347923A
CN112347923A CN202011229272.0A CN202011229272A CN112347923A CN 112347923 A CN112347923 A CN 112347923A CN 202011229272 A CN202011229272 A CN 202011229272A CN 112347923 A CN112347923 A CN 112347923A
Authority
CN
China
Prior art keywords
pedestrian
track
latent variable
generator
trajectory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011229272.0A
Other languages
Chinese (zh)
Inventor
杨彪
何才臻
徐黎明
闫国成
吕继东
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu China Israel Industrial Technology Research Institute
Changzhou University
Original Assignee
Jiangsu China Israel Industrial Technology Research Institute
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu China Israel Industrial Technology Research Institute, Changzhou University filed Critical Jiangsu China Israel Industrial Technology Research Institute
Priority to CN202011229272.0A priority Critical patent/CN112347923A/en
Publication of CN112347923A publication Critical patent/CN112347923A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a roadside end pedestrian track generation algorithm based on an confrontation generation network, which generates a multi-mode predicted track by utilizing a social attention mechanism and a pedestrian track latent variable; through the confrontation generation training of the track generator and the discriminator, the capabilities of the generator and the discriminator are continuously optimized, and the precision of the track generated by the generator is improved; the method comprises the steps that a social attention mechanism based on head orientation is provided, the head orientation of a pedestrian is obtained through the last speed direction of the pedestrian, a cosine value of an included angle between the pedestrians is calculated according to head orientation information, soft attention and hard attention mechanisms optimize the output of the social attention mechanism by using the calculated angle information, and the output is converged through a maximum pooling layer; a new latent variable generating method is proposed, two feedforward neural networks are used for learning latent variables from pedestrian historical tracks and observation tracks respectively, the input of the latent variable generator comprises position, speed and acceleration, and the distribution of three types of latent variables is generated from the three types of input respectively.

Description

Roadside end pedestrian track prediction algorithm based on confrontation generation network
Technical Field
The invention relates to the technical field of automatic driving, in particular to pedestrian trajectory prediction, and provides a roadside end pedestrian trajectory prediction algorithm based on a confrontation generation network.
Background
With the continuous development of the automatic navigation of the robot and the automatic driving technology of the automobile, the unmanned technology gets wide attention and has bright application prospect; the unmanned vehicle can bring convenience to the life of people, but the unmanned vehicle needs to monitor the motion trail of pedestrians on a road and predict the future motion trail of the pedestrians during driving, so that collision with the pedestrians is avoided; in order to better predict the motion trail of the pedestrian, the unmanned vehicle needs to process the observed pedestrian trail data, learn the rule of the motion of the pedestrian and predict the next motion state of the pedestrian according to the rule; the challenge of accurately predicting the pedestrian motion trajectory comes from the complexity of human behavior and its own intentions and variety of external stimuli; pedestrian motion behavior may be driven by its own target intent, the existence of action interactions between surrounding objects, social relationships, social rules and norms, or its topological, geometric and semantic environment, most of which are not directly visible, need to be inferred from complex laws of motion, or modeled from contextual information; how to let the unmanned vehicle learn the potential motion law is the key for accurately predicting the pedestrian track;
due to the fact that behaviors of pedestrians are random, whether the pedestrians are machines or humans, future tracks of the pedestrians cannot be predicted accurately; the pedestrian's trajectory is influenced by the surrounding environment, such as person-to-person, person-to-object, which is potentially undescribable; however, the future track of the pedestrian is always influenced by the motion of people and objects in front of the pedestrian, and the common knowledge is utilized to be beneficial to simulating the social interaction behavior of the pedestrian, so that the future motion track of the pedestrian is well predicted;
the motion modes of pedestrians are complex and diverse, the complex pedestrian motion is difficult to describe by a dynamic model, and a common method for modeling the general motion of a maneuvering target is to define and fuse different typical motion modes, each mode is described by different dynamic states; the patterns may be linear movements, turning maneuvers or sudden accelerations, forming over time a sequence capable of describing complex movement behaviour; the diversity of pedestrian motion patterns in pedestrian trajectory prediction must also be considered;
disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to solve the problems that the motion modes of pedestrians are complex and various and the complex pedestrian motion is difficult to describe by a dynamic model in the prior art, a roadside end pedestrian track prediction algorithm based on a confrontation generation network is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows: a roadside end pedestrian trajectory prediction algorithm based on a confrontation generation network comprises the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
Further, in step S31: the latent variable predictor consists of two feedforward neural networks defined as follows:
Figure BDA0002764619690000021
Figure BDA0002764619690000022
wherein Ψ (-) and
Figure BDA0002764619690000031
is a feed-forward neural network that is,
Figure BDA0002764619690000032
and
Figure BDA0002764619690000033
are the parameters of the two feedforward neural networks respectively,
Figure BDA0002764619690000034
and
Figure BDA0002764619690000035
is the k-th type input of the latent variable predictor.
Further, in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
Figure BDA0002764619690000036
wherein
Figure BDA0002764619690000037
And
Figure BDA0002764619690000038
respectively representing the latent variable distribution of the observed track and the latent variable distribution of the real track.
Further, the step 1 specifically includes the following steps:
s11: processing input track data: the input trace being a series of time-series trace points
Figure BDA0002764619690000039
Wherein
Figure BDA00027646196900000310
Is the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: converting two-dimensional position information into multi-dimensional vector of fixed length by using single-layer multi-layer perceptron
Figure BDA00027646196900000311
The definition of the multi-layer perceptron is as follows:
Figure BDA00027646196900000312
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movement
Figure BDA0002764619690000041
The encoder long short term memory network (LSTM) is defined as follows:
Figure BDA0002764619690000042
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
Further, the step 2 specifically includes the following steps:
s21: calculating the azimuth angle between the pedestrians: taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
Figure BDA0002764619690000043
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijAre all set to 0 or 1, when the row isWhen the cosine value of the azimuth included angle between the people is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
Figure BDA0002764619690000044
where δ (-) denotes a sigmoid activation function,
Figure BDA0002764619690000051
represents 1 × 1 convolutional layers;
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
Figure BDA0002764619690000052
Further, the step 4 specifically includes the following steps:
s41: output of social attention module
Figure BDA0002764619690000053
And the output of latent variable predictor
Figure BDA0002764619690000054
Hidden from pedestrian movement
Figure BDA0002764619690000055
Make a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various information
Figure BDA0002764619690000056
The long-short term memory network of the decoder is defined as follows:
Figure BDA0002764619690000057
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
Figure BDA0002764619690000058
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,
Figure BDA0002764619690000059
is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinates
Figure BDA00027646196900000510
Wherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
Figure BDA00027646196900000511
wherein
Figure BDA00027646196900000512
Is the real track of the pedestrian,
Figure BDA00027646196900000513
is the m-th generationThe predicted future trajectory of the pedestrian is set to m-20 in the present invention.
Further, the step 5 specifically includes the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
Figure BDA0002764619690000061
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,
Figure BDA0002764619690000062
is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
Figure BDA0002764619690000063
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
The invention has the beneficial effects that the roadside end pedestrian track prediction algorithm based on the confrontation generation network (1) provides a social attention module, the module utilizes the correlation between the head orientation and the track prediction, and the attention mechanism improves the social interaction capturing capability of the social pooling layer under different scenes;
(2) a new latent variable predictor is provided, which can estimate the latent variable with rich knowledge to better predict the track; only the input of the prediction variable is extracted from the trajectory data, thus only little computational overhead is added;
(3) embedding the social attention focusing module and the latent variable predictive variable into an confrontation generating network framework to generate multi-mode output acceptable by social rules;
drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of a challenge generation training strategy proposed in the present invention;
FIG. 2 is a schematic diagram of a generator proposed in the present invention
FIG. 3 is a schematic representation of latent variable prediction proposed in the present invention;
FIG. 4 is a schematic diagram of the discriminator proposed in the present invention;
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
1-4, the roadside end pedestrian trajectory prediction algorithm based on the confrontation generation network includes the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step 1 specifically comprises the following steps:
s11: processing input track data: the input trace being a series of time-series trace points
Figure BDA0002764619690000081
Wherein
Figure BDA0002764619690000082
Is the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: converting two-dimensional position information into multi-dimensional vector of fixed length by using single-layer multi-layer perceptron
Figure BDA0002764619690000083
The definition of the multi-layer perceptron is as follows:
Figure BDA0002764619690000084
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movement
Figure BDA0002764619690000085
The encoder long short term memory network (LSTM) is defined as follows:
Figure BDA0002764619690000086
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
Further, the step 2 specifically includes the following steps:
s21: calculating the azimuth angle between the pedestrians: the invention is based on the fact that the future trajectory of a pedestrian is always influenced by the front crowd and not by the rear crowd; taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
Figure BDA0002764619690000087
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijIs set to be 0 or 1, when the cosine value of the azimuth angle between the pedestrians is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
Figure BDA0002764619690000091
where δ (-) denotes a sigmoid activation function,
Figure BDA0002764619690000092
represents 1 × 1 convolutional layers;
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
Figure BDA0002764619690000093
The step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
In step S31: the invention applies a latent variable predictor to generate a predictable latent variable distribution, which is a method for predicting latent variable distribution parameters in a data-driven manner; potential variable distribution parameters can be predicted from the observation track and the real track of the pedestrian in a training stage by a potential variable generator, so that a potential motion rule can be learned; the latent variable predictor consists of two feedforward neural networks defined as follows:
Figure BDA0002764619690000094
Figure BDA0002764619690000095
wherein Ψ (-) and
Figure BDA0002764619690000096
is a feed-forward neural network that is,
Figure BDA0002764619690000097
and
Figure BDA0002764619690000098
are the parameters of the two feedforward neural networks respectively,
Figure BDA0002764619690000099
and
Figure BDA00027646196900000910
is the k-th type input of the latent variable predictor.
Further, in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
Figure BDA0002764619690000101
wherein
Figure BDA0002764619690000102
And
Figure BDA0002764619690000103
respectively representing the latent variable distribution of the observed track and the latent variable distribution of the real track.
Further, the step 4 specifically includes the following steps:
s41: the interaction between pedestrians is obtained by the social attention module, the pedestrian motion latent variable distribution is obtained by the latent variable predictor, and the output of the social attention module is output
Figure BDA0002764619690000104
And the output of latent variable predictor
Figure BDA0002764619690000105
Hidden from pedestrian movement
Figure BDA0002764619690000106
Make a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various information
Figure BDA0002764619690000107
The long-short term memory network of the decoder is defined as follows:
Figure BDA0002764619690000108
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
Figure BDA0002764619690000111
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,
Figure BDA0002764619690000112
is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinates
Figure BDA0002764619690000113
Wherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
Figure BDA0002764619690000114
wherein
Figure BDA0002764619690000115
Is the real track of the pedestrian,
Figure BDA0002764619690000116
is the predicted future trajectory of the pedestrian by the mth generator, and m is set to 20 in the present invention.
Further, the step 5 specifically includes the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
Figure BDA0002764619690000117
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,
Figure BDA0002764619690000118
is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
Figure BDA0002764619690000121
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
The invention provides a roadside end pedestrian track generation algorithm based on an confrontation generation network, which generates a multi-mode predicted track by utilizing a social attention mechanism and a pedestrian track latent variable; according to the method, the capabilities of the generator and the discriminator are continuously optimized through the confrontation generation training of the trajectory generator and the discriminator, and the accuracy of the trajectory generated by the generator is improved; the invention provides a social attention mechanism based on head orientation, which obtains the head orientation of a pedestrian through the last speed direction of the pedestrian, calculates the cosine value of an included angle between the pedestrians according to the head orientation information, optimizes the output of the social attention mechanism by using the calculated angle information and converges the output through a maximum pooling layer; the invention provides a new latent variable generation method, which is characterized in that two feedforward neural networks are used for learning latent variables from pedestrian historical tracks and observation tracks respectively, the input of a latent variable generator comprises position, speed and acceleration, and the distribution of three types of latent variables is generated from the three types of input respectively; the three types of latent variable distributions are combined with Gaussian random noise to generate multi-modal output and maintain the capability of processing uncertain input in the future.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (7)

1. A roadside end pedestrian trajectory prediction algorithm based on an confrontation generation network is characterized in that: the method comprises the following steps:
s10: encoding the input track using an encoder;
s20: calculating the social attention of the pedestrian by utilizing the head orientation of the pedestrian;
s30: applying a latent variable predictor to generate a predictable latent variable distribution;
s40: generating a predicted future trajectory of the pedestrian;
s50: optimizing the pedestrian trajectory generated by the generator using a discriminator;
the step S30 includes the following steps:
s31: designing a latent variable predictor;
s32: and predicting the potential variable distribution of the pedestrian by using a latent variable predictor.
2. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: in step S31: the latent variable predictor consists of two feedforward neural networks defined as follows:
Figure FDA0002764619680000011
Figure FDA0002764619680000012
wherein Ψ (-) and
Figure FDA0002764619680000013
is a feed-forward neural network that is,
Figure FDA0002764619680000014
and
Figure FDA0002764619680000015
are the parameters of the two feedforward neural networks respectively,
Figure FDA0002764619680000016
and
Figure FDA0002764619680000017
is the k-th type input of the latent variable predictor.
3. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 2, characterized by: in step S32: k is 1, 2 and 3, and respectively represents the position, speed and acceleration of the pedestrian, the position reveals the layout of the potential scene, the speed reflects the motion mode of different pedestrians, and the acceleration shows the motion intensity of the pedestrian; the latent variable predictor estimates the latent distribution of three variables from the three inputs; gaussian random noise is used for generating multi-mode output, and finally, the three kinds of latent variable distribution and the Gaussian random noise are fused together to finally form latent variable distribution parameters in a training stage;
in the testing stage, a latent variable predictor predicts the latent variable distribution from the observation track of the pedestrian, the latent variable predictor inputs the position, speed and acceleration information of the pedestrian, can respectively predict the latent variable distribution of the position, speed and acceleration of the pedestrian from the three types of input, and combines the three types of latent variables and Gaussian random noise to form a final latent variable which is output to a track generator;
in the training process, the latent variable loss function is used for measuring the difference between the latent variable distribution of the observed track and the latent variable distribution of the real track, and KL divergence is used for calculating the error, wherein the formula is as follows:
Figure FDA0002764619680000021
wherein
Figure FDA0002764619680000022
And
Figure FDA0002764619680000023
respectively representing the latent variable distribution of the observed track and the latent variable distribution of the real track.
4. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 1 specifically comprises the following steps:
s11: processing input track data: the input trace being a series of time-series trace points
Figure FDA0002764619680000024
Wherein
Figure FDA0002764619680000025
Is the position coordinate of the target i at time t; the position coordinates of each track at different moments are sent into a coding network;
s12: using a single-layered multi-layered perceptronTwo-dimensional position information is converted into multi-dimensional vector with fixed length
Figure FDA0002764619680000026
The definition of the multi-layer perceptron is as follows:
Figure FDA0002764619680000027
where φ (-) is a multi-layered perceptron using a ReLU nonlinear activation function, WeeIs a parameter of the multi-layer perceptron;
s13: sending the multidimensional vector into a coder based on a long-term and short-term memory network to generate a hidden state of the pedestrian movement
Figure FDA0002764619680000028
The encoder long short term memory network (LSTM) is defined as follows:
Figure FDA0002764619680000029
where LSTM (. beta.) is a long-short term memory network, WencoderThe parameter is a parameter of a long-term and short-term memory network of the encoder, and the parameter can be shared among all pedestrians in the same scene.
5. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 2 specifically comprises the following steps:
s21: calculating the azimuth angle between the pedestrians: taking the speed of the last position of the pedestrian as the future speed of the pedestrian, taking the direction of the speed of the last position of the pedestrian as the head direction and the track motion direction, and calculating the cosine value of the azimuth angle between the pedestrians by using the head directions of all the pedestrians as follows:
Figure FDA0002764619680000031
where n is the number of all pedestrians in the same scene, bijRepresenting the included angle between the pedestrian i and the pedestrian j;
s22: designing an attention mechanism: designing a soft attention mechanism and a hard attention mechanism according to the cosine values of the azimuth included angles among the pedestrians; the effect of one pedestrian on another decreases as the azimuthal cosine value between them increases; the hard attention mechanism uses a matrix H with the same shape as cos (beta)AIs represented by HAEach of the elements hijIs set to be 0 or 1, when the cosine value of the azimuth angle between the pedestrians is greater than the preset threshold value 0.2, the corresponding attention weight hij1, when the cosine value of the azimuth included angle between the pedestrians is less than the preset threshold value 0.2, the corresponding attention weight hijIs 0; the soft attention mechanism and the hard attention mechanism calculate attention weights through thresholds; adaptive computation of correlations between pedestrians for a soft attention mechanism, weight S for the soft attention mechanismAThe calculation formula of (a) is as follows:
Figure FDA0002764619680000032
where δ (-) denotes a sigmoid activation function,
Figure FDA0002764619680000033
represents 1 × 1 convolutional layers;
the soft and hard attention machine is used for the output of the second multilayer perceptron, the soft attention machine and the hard attention machine are used for optimizing the output of the second multilayer perceptron, and the attention machine is converged through the largest pooling layer to obtain the output
Figure FDA0002764619680000034
6. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 4 specifically comprises the following steps:
s41: output of social attention module
Figure FDA0002764619680000041
And the output of latent variable predictor
Figure FDA0002764619680000042
Hidden from pedestrian movement
Figure FDA0002764619680000043
Make a splice
S42: the splicing result is input into a decoder based on a long-term and short-term memory network to obtain a new track hidden state fused with various information
Figure FDA0002764619680000044
The long-short term memory network of the decoder is defined as follows:
Figure FDA0002764619680000045
where LSTM (. beta.) is a long-short term memory network, WdecoderThe parameters of the decoder long-term and short-term memory network can be shared among all pedestrians in the same scene;
s43: decoding the new hidden state by using a multilayer perceptron to obtain future track coordinates of the pedestrian: the multi-layer perceptron is defined as follows:
Figure FDA0002764619680000046
where γ (-) is a multi-layered perceptron using the ReLU nonlinear activation function,
Figure FDA0002764619680000047
is the future position coordinate of the pedestrian, and the output prediction result is a series of position coordinates
Figure FDA0002764619680000048
Wherein T isobsIs the length of the predicted trajectory; the invention adopts multi-mode output, the track generator outputs m tracks at a time, and 2 norm loss functions are used for calculating the deviation between the m tracks and the true value, and the expression is as follows:
Figure FDA0002764619680000049
wherein
Figure FDA00027646196800000410
Is the real track of the pedestrian,
Figure FDA00027646196800000411
is the predicted future trajectory of the pedestrian by the mth generator, and m is set to 20 in the present invention.
7. The roadside end pedestrian trajectory prediction algorithm based on the countermeasure generation network of claim 1, characterized by: the step 5 specifically comprises the following steps:
s51: inputting the trajectory generated by the generator and the real trajectory of the pedestrian into the discriminator
S52: the discriminator discriminates whether the input trajectory is a trajectory generated by the generator or a real trajectory: the discriminator uses an encoder based on a long-short term memory network to encode a real track and a generated track, a multi-layer perceptron is applied to a hidden state output by the encoder to obtain a classification score, under an ideal condition, the discriminator learns social rules of the pedestrian track, and the track which does not accord with the rules is judged to be false by the discriminator;
the penalty function against generative training is expressed as follows:
Figure FDA0002764619680000051
where D is the discriminator, G is the generator, z is the latent variable distribution parameter, x is the trajectory data of the data,
Figure FDA0002764619680000052
is the kth input (position, velocity, acceleration) of the observed trajectory latent variable predictor; through game training of the generator and the discriminator, the generator can finally generate samples which are similar to a training set and accord with social rules; because the generator learns a probability distribution similar to that of the training set, each sampling can give different reasonable samples, and therefore the probability distribution can be used for predicting multiple possibilities;
the total loss function is composed of three parts, wherein one part is a training loss function generated by confrontation, one part is the KL divergence of latent variable distribution, and the other part is the deviation between a predicted value and a true value; the total loss function weight is defined as follows:
Figure FDA0002764619680000053
wherein alpha and beta are respectively set as numbers between 1 and 10, and specific values are obtained by cross validation on a reference data set; during training, a generator and a discriminator are iteratively trained, the batch processing size is set to be 64, 600 epochs, the learning rate is set to be 0.001, and an Adam optimizer is used for optimizing parameters.
CN202011229272.0A 2020-11-06 2020-11-06 Roadside end pedestrian track prediction algorithm based on confrontation generation network Withdrawn CN112347923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011229272.0A CN112347923A (en) 2020-11-06 2020-11-06 Roadside end pedestrian track prediction algorithm based on confrontation generation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011229272.0A CN112347923A (en) 2020-11-06 2020-11-06 Roadside end pedestrian track prediction algorithm based on confrontation generation network

Publications (1)

Publication Number Publication Date
CN112347923A true CN112347923A (en) 2021-02-09

Family

ID=74428364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011229272.0A Withdrawn CN112347923A (en) 2020-11-06 2020-11-06 Roadside end pedestrian track prediction algorithm based on confrontation generation network

Country Status (1)

Country Link
CN (1) CN112347923A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113156957A (en) * 2021-04-27 2021-07-23 东莞理工学院 Autonomous mobile robot self-supervision learning and navigation method based on confrontation generation network
CN113473682A (en) * 2021-09-01 2021-10-01 启东晶尧光电科技有限公司 Artificial intelligence-based smart city lighting street lamp adjusting method and system
CN113538506A (en) * 2021-07-23 2021-10-22 陕西师范大学 Pedestrian trajectory prediction method based on global dynamic scene information depth modeling
CN113689470A (en) * 2021-09-02 2021-11-23 重庆大学 Pedestrian motion trajectory prediction method under multi-scene fusion
CN114757975A (en) * 2022-04-29 2022-07-15 华南理工大学 Pedestrian trajectory prediction method based on transformer and graph convolution network
CN114898550A (en) * 2022-03-16 2022-08-12 清华大学 Pedestrian trajectory prediction method and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113156957A (en) * 2021-04-27 2021-07-23 东莞理工学院 Autonomous mobile robot self-supervision learning and navigation method based on confrontation generation network
CN113538506A (en) * 2021-07-23 2021-10-22 陕西师范大学 Pedestrian trajectory prediction method based on global dynamic scene information depth modeling
CN113473682A (en) * 2021-09-01 2021-10-01 启东晶尧光电科技有限公司 Artificial intelligence-based smart city lighting street lamp adjusting method and system
CN113473682B (en) * 2021-09-01 2021-11-26 启东晶尧光电科技有限公司 Artificial intelligence-based smart city lighting street lamp adjusting method and system
CN113689470A (en) * 2021-09-02 2021-11-23 重庆大学 Pedestrian motion trajectory prediction method under multi-scene fusion
CN113689470B (en) * 2021-09-02 2023-08-11 重庆大学 Pedestrian motion trail prediction method under multi-scene fusion
CN114898550A (en) * 2022-03-16 2022-08-12 清华大学 Pedestrian trajectory prediction method and system
CN114898550B (en) * 2022-03-16 2024-03-19 清华大学 Pedestrian track prediction method and system
CN114757975A (en) * 2022-04-29 2022-07-15 华南理工大学 Pedestrian trajectory prediction method based on transformer and graph convolution network
CN114757975B (en) * 2022-04-29 2024-04-16 华南理工大学 Pedestrian track prediction method based on transformer and graph convolution network

Similar Documents

Publication Publication Date Title
CN112347923A (en) Roadside end pedestrian track prediction algorithm based on confrontation generation network
CN112119409B (en) Neural network with relational memory
Zhao et al. A spatial-temporal attention model for human trajectory prediction.
Kim et al. Multi-head attention based probabilistic vehicle trajectory prediction
CN112734808B (en) Trajectory prediction method for vulnerable road users in vehicle driving environment
Cho et al. Deep predictive autonomous driving using multi-agent joint trajectory prediction and traffic rules
CN111339867A (en) Pedestrian trajectory prediction method based on generation of countermeasure network
Yang et al. A novel graph-based trajectory predictor with pseudo-oracle
US20230419113A1 (en) Attention-based deep reinforcement learning for autonomous agents
Ye et al. GSAN: Graph self-attention network for learning spatial–temporal interaction representation in autonomous driving
CN117077727B (en) Track prediction method based on space-time attention mechanism and neural ordinary differential equation
CN115829171A (en) Pedestrian trajectory prediction method combining space information and social interaction characteristics
Kuo et al. Trajectory prediction with linguistic representations
CN115659275A (en) Real-time accurate trajectory prediction method and system in unstructured human-computer interaction environment
Yu et al. Hybrid attention-oriented experience replay for deep reinforcement learning and its application to a multi-robot cooperative hunting problem
CN112418421B (en) Road side end pedestrian track prediction algorithm based on graph attention self-coding model
Mirus et al. An investigation of vehicle behavior prediction using a vector power representation to encode spatial positions of multiple objects and neural networks
Takano et al. Prediction of human behaviors in the future through symbolic inference
KR102234917B1 (en) Data processing apparatus through neural network learning, data processing method through the neural network learning, and recording medium recording the method
Shukla et al. UBOL: User-Behavior-aware one-shot learning for safe autonomous driving
Zhang et al. Learning to discover task-relevant features for interpretable reinforcement learning
Dan Spatial-temporal block and LSTM network for pedestrian trajectories prediction
Karle et al. Mixnet: Physics constrained deep neural motion prediction for autonomous racing
Wu et al. A novel trajectory generator based on a constrained GAN and a latent variables predictor
Takano et al. What do you expect from a robot that tells your future? The crystal ball

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210209