CN114580715A - Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model - Google Patents

Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model Download PDF

Info

Publication number
CN114580715A
CN114580715A CN202210149868.2A CN202210149868A CN114580715A CN 114580715 A CN114580715 A CN 114580715A CN 202210149868 A CN202210149868 A CN 202210149868A CN 114580715 A CN114580715 A CN 114580715A
Authority
CN
China
Prior art keywords
layer
data set
network
pedestrian
eth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210149868.2A
Other languages
Chinese (zh)
Inventor
王翔辰
杨欣
樊江锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210149868.2A priority Critical patent/CN114580715A/en
Publication of CN114580715A publication Critical patent/CN114580715A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a pedestrian trajectory prediction method based on generation of a confrontation network and a long-term and short-term memory model, which comprises the following steps: selecting an ETH pedestrian trajectory data set as a data source, selecting a Social GAN mode, and modeling the historical pedestrian trajectory and the current position by means of a long-short term memory network (LSTM) to realize pedestrian trajectory prediction; the generator performs feature analysis on the historical track of each pedestrian in the data by using a long-term and short-term memory network; the discriminator part extracts input characteristics through a plurality of full connection layers and also memorizes the characteristics of the historical track through an LSTM network; considering that the official ETH data set does not contain the person ID tag information, the ETH data set and the supplementary data set thereof are used to successfully train the model, and common track prediction indexes ADE and FDE are selected as performance evaluation indexes.

Description

Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model
Technical Field
The invention relates to the technical field of pedestrian trajectory prediction, in particular to a pedestrian trajectory prediction method based on a generated countermeasure network and a long-term and short-term memory model.
Background
Pedestrian trajectory prediction is more challenging than other moving objects in traffic scenarios such as cars and cyclists because pedestrians, which are not moving regularly, neither travel within a prescribed lane as in a motor vehicle nor remain traveling within lane boundaries as in a non-motor vehicle, are generally not moving regularly. Due to a series of well-defined 'road rules', the motion states of the automobile and the bicycle are restricted because the owner of the automobile obeys the rules. However, pedestrians are different from each other, there is no corresponding law and regulation to limit what trajectory pedestrians should move according to, and when there is an intersection without traffic lights in a traffic scene or there are a large number of pedestrians, the movement of pedestrians becomes more complicated. Therefore, an effective pedestrian motion trajectory prediction algorithm is needed to address these challenges.
Predicting the pedestrian's motion trajectory requires consideration of a number of factors that may affect the pedestrian's motion. In recent years, most researches have been started from the perspective of pedestrian behaviors, for example, the mechanism of interaction between vehicles and pedestrians is searched by considering the reaction of pedestrians to oncoming automobiles at intersections without traffic lights; predicting when a pedestrian crosses a street; in addition, the on-line prediction of pedestrian behavior requires data acquisition from sensors and extraction of various cues, for example, using machine vision techniques to obtain various types of contextual cues.
1. Prediction based on static environmental cues:
some scholars propose behaviors CNN, and adopt a neural network to model pedestrian behaviors in crowded scenes and prove the effectiveness of the pedestrian behaviors. Some learners learn the weighted sum of ordinary differential equations from historical track information in a fixed scene, and provide a new pedestrian position prediction method and verify the good effect of the pedestrian position prediction method.
(2) Prediction based on dynamic environmental cues:
pedestrian behavior can also be affected by other dynamic objects in the scene. Some scholars study a microscopic simulation model to analyze the behavior of the cyclist at the intersection without traffic lights. In addition to considering the existence of other road users, the traffic participants should negotiate the traffic sequence to coordinate the traffic behavior, and some scholars provide a new data set for studying the behavior of the traffic participants and repeatedly study the way in which drivers communicate with pedestrians to avoid collisions with each other.
(3) Target cue based prediction:
people are not familiar with the surrounding environment at all times, and the inattention of pedestrians is often a significant cause of traffic accidents. Whether the pedestrian notices the driving vehicle can be judged by depending on the head direction of the pedestrian, and the specific method is to use the results of a plurality of discrete directional classifiers and add physical constraint and time filtering to improve the robustness to obtain continuous head direction estimation. In addition, the neural network can be used for realizing real-time 2D estimation of the whole body skeleton of the human body, and further the effect of detecting a plurality of human postures in the image is achieved. The pedestrian's whole body appearance can also be used for trajectory prediction, e.g., classifying objects and predicting the trajectory or pose 7 of a particular class of objects as 1; dense optical flow features may also be employed around pedestrian bounding boxes to estimate whether a pedestrian crossing will stop at the roadside.
Disclosure of Invention
In view of the above, the present invention provides a pedestrian trajectory prediction method based on a generated countermeasure network and a long-short term memory model, so as to train a new neural network model and improve the prediction performance.
In order to achieve the purpose, the invention adopts the following technical scheme:
a pedestrian trajectory prediction method based on a generated countermeasure network and a long-short term memory model, the pedestrian trajectory prediction method comprising the steps of:
step S1, acquiring an ETH data set and an ETH supplementary data set, and matching the ETH data set and the ETH supplementary data set to acquire a definite track of each character object in the ETH data set, wherein the definite track of each character object is used as a data set for training and testing;
step S2, dividing the training data set obtained in step S1 into a training set and a test set according to a certain proportion, and preprocessing;
step S3, constructing a pedestrian trajectory prediction network, which comprises: the pedestrian trajectory prediction network comprises a generator network and a discriminator network, wherein the generator network and the discriminator network form a generation countermeasure network;
step S4, inputting the training set obtained in the step S2 into the pedestrian trajectory prediction network constructed in the step S3, performing model training, performing multiple rounds of iterative training until a loss function is converged, and fixing network parameters;
step S5, performing pedestrian trajectory prediction, including: and inputting the test set obtained in the step S2 into the pedestrian trajectory prediction model obtained in the step S4 for prediction, so as to obtain a prediction result.
Further, the step S1 specifically includes:
step S101, firstly, an ETH data set is obtained, wherein the ETH data set is a parameter composed of a time label, a person ID label and a person position point coordinate (x, y), then idl label files in the ETH data set are read in a character string mode, then effective information in idl label files is read through a universal code re module in Python, and finally the effective information is exported in a csv table mode;
step S102, firstly, an ETH supplementary data set is obtained, wherein no additional tag data file exists in the data set, and only all the figure pictures are segmented; then, naming each picture in the ETH supplementary data set by referring to the label information in the ETH data set, and dividing all the pictures into different folders according to the classification of people; finally, acquiring a file directory of each folder through a universal code os module in the Python;
step S103, matching the csv table obtained in the step S101 and each file directory obtained in the step S102 in a data list in Python one by one to obtain an explicit track of each person object in the ETH data set;
and step S104, dividing the track data into observation track data, prediction track data, a time list and an ID list, and constructing a data set for model training and testing.
Further, the step S2 includes:
step S201, dividing the data set constructed in the step S104 into 4 parts according to the dividing ratio: 1, dividing data into a training set and a test set;
step S202, all data are standardized to be between 0 and 1.
Further, the generator network further comprises: the device comprises an encoder, a decoder, a social characteristic embedding layer and a social pooling layer, wherein the encoder and the decoder both comprise a full connection layer and an LSTM layer;
in the generator network, a decoder and a social characteristic embedding layer are arranged in front, the input of the decoder and the output of the social characteristic embedding layer are track data, the output of the decoder and the output of the social characteristic embedding layer are transmitted to the social characteristic embedding layer, the output of the social characteristic embedding layer is combined with noise and input to the decoder, and the output of the decoder is transmitted to a discriminator network, wherein in the encoder, a fully connected layer is arranged in front, and then a plurality of LSTM networks are connected, correspondingly, in the decoder, a plurality of LSTM networks are arranged in front, and then a fully connected layer is connected.
Further, in the discriminator network, one full connection layer is disposed in front of the discriminator network, and another full connection layer is disposed behind the discriminator network, and a plurality of LSTM networks are disposed between the two full connection layers.
Further, the calculation formula of the discriminator network is as follows:
Figure BDA0003510334370000031
Figure BDA0003510334370000032
Figure BDA0003510334370000033
Figure BDA0003510334370000034
in formula (1) to formula (4), t is 1.,tobs,...,tobs+tprep,TiRepresenting a union of true and false tracks, tobsFor the length of time of the past trace, tprepFor future track time lengths, Xi,YiIs a position coordinate, delta is a full link layer for converting two-dimensional coordinates into a feature vector, WδIs the full link layer parameter; the LSTM layer encodes the feature vector at each time until t-tobs+tprepThe time is over to obtain the sequence coding vector
Figure BDA0003510334370000041
WdiIs a parameter of the LSTM layer; rho is a multilayer perceptron, WρFor the multi-layer perceptron parameter, a score s for the trajectory is obtainedi
7. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model according to claim 5, wherein the calculation formula of the encoder is as follows:
Figure BDA0003510334370000042
Figure BDA0003510334370000043
in formula (5) to formula (6), t is 1obs,tobsFor the length of the time of the past trace,
Figure BDA0003510334370000044
mu is a full connecting layer for converting the two-dimensional coordinate into a characteristic vector for the position of the pedestrian i at the moment t,
Figure BDA0003510334370000045
is a position feature vector, WμeFor the full-concatenation layer parameters, the LSTM layer encodes the eigenvectors at each instant until t ═ tobsThe moment is finished to obtain a sequence context vector
Figure BDA0003510334370000046
WeAre parameters of the LSTM layer.
Further, the calculation formula of the decoder is as follows:
Figure BDA0003510334370000047
Figure BDA0003510334370000048
Figure BDA0003510334370000049
Figure BDA00035103343700000410
Figure BDA00035103343700000411
in formula (7) -formula (11), t ═ tobs+1,...,tprep,ciThe tensor of the pedestrian i obtained by the attention module represents the influence of other pedestrians on the pedestrian i in the scene; phi is a multi-layer sensor, WφFor the purpose of the multi-layer perceptron parameter,
Figure BDA0003510334370000051
representing the influence result of other pedestrians on the pedestrian i at the time t; concatenating the result with noise z to obtain the decoder LSTM initial hidden state
Figure BDA0003510334370000052
Mu is a full connection layer for connecting two-dimensional coordinates of the previous moment
Figure BDA0003510334370000053
Conversion into a feature vector, WμdIs the full link layer parameter; the LSTM layer encodes each time instant and the hidden state of each time instant
Figure BDA0003510334370000054
The predicted coordinate of the t moment can be obtained through the multilayer perceptron gamma
Figure BDA0003510334370000055
WdFor LSTM decoder parameters, WγIs the multilayer perceptron gamma parameter.
Further, the total loss function used is:
L=LGAN(G,D)+λLL2(G) (12)
in the formula (12), LGAN(G, D) is expressed as the loss of antagonism, LL2(G) Indicated as L2 loss, wherein,
Figure BDA0003510334370000056
Figure BDA0003510334370000057
in equations (13) and (14), λ is a hyperparameter for balancing the generation of the impedance loss and the L2 loss, k is a hyperparameter representing the number of samples taken of the result generated by the generator, TiRepresenting a union of true and false tracks, Xi,YiFor the position coordinates, z represents noise and p represents the number of training sessions.
The invention has the beneficial effects that:
in the invention, the situation that the official ETH data set does not contain the person ID label information is considered, the model is successfully trained by using the ETH data set and the supplementary data set thereof, and common track prediction indexes ADE and FDE are selected as performance evaluation indexes. Experimental results show that the model provided by the invention has better performance in pedestrian trajectory prediction.
Drawings
Fig. 1 is a schematic structural diagram of a pedestrian trajectory prediction network provided in embodiment 1;
FIG. 2 is a schematic flow chart of a pedestrian trajectory prediction process;
fig. 3 is a schematic structural diagram of the generation countermeasure network provided in embodiment 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1-3, the present embodiment provides a pedestrian trajectory prediction method based on generation of a confrontation network and a long-short term memory model, which specifically includes the following steps:
step S1, an ETH data set and an ETH supplementary data set are obtained and are matched to obtain the clear track of each person object in the ETH data set.
Specifically, in this embodiment, in step S1, based on Python, the parameter import module is first called, and then according to the obtained parameters, a data file path is determined, a saved file directory is determined, a flag bit of an execution code is determined, and initialization of a training hyper-parameter is performed. The parameter import is configured by a time stamp, a person ID stamp, and a person position point coordinate (x, y). The data set processing file contains three functions. The first function is used for reading idl the label file in a character string form, reading the useful information through the regular expression module, and exporting the useful information in a csv table form. The second function is realized by reading the csv file generated in the first function, acquiring the tag information in the ETH supplementary data set, matching one by one under the list, and sorting the data into a required format, so that the clear track of each character object in the ETH data set can be obtained. The third function is to store the training data, and divide the trajectory data into observed trajectory data, predicted trajectory data, time list and ID list.
More specifically, in this embodiment, the step S1 specifically includes the following sub-steps:
step S101, firstly, an ETH data set is obtained, wherein the ETH data set is a parameter formed by a time tag, a person ID tag and a person position point coordinate (x, y), then idl tag files in the ETH data set are read in a character string mode, then effective information in idl tag files is read through a universal code re module in Python, and finally the effective information is exported in a csv table mode.
Specifically, the above-mentioned general code re module is a regular expression module re, which can check whether a character string matches a certain pattern, and in this embodiment, the module is used to read in irregular data in an ETH data set in a required manner.
Step S102, firstly, an ETH supplementary data set is obtained, wherein no additional tag data file exists in the data set, and only all the figure pictures are segmented;
then, naming each picture in the ETH supplementary data set by referring to label information in the ETH data set, and dividing all the pictures into different folders according to the classification of people;
and finally, acquiring the file directory of each folder through an os module.
Specifically, in this embodiment, the os module is a general code in Python.
Step S103, the csv table obtained in step S101 and each file directory obtained in step S102 are matched one by one in a data list in Python to obtain an explicit track of each person object in the ETH data set.
And step S104, dividing the track data into observation track data, prediction track data, a time list and an ID list, and constructing a data set for model training and testing.
Step S2, dividing and preprocessing the data set, which specifically includes the following sub-steps:
step S201, dividing the data set constructed in the step S104 into 4 parts according to the dividing ratio: 1, dividing data into a training set and a test set;
step S202, normalizing all data, wherein the data normalization is in the form of a class, and the class reads the maximum value and the minimum value in the data, and normalizes all data to be between 0 and 1 by taking the maximum value and the minimum value as a normalization basis. The use of this class is to define data boundaries for this class and identify the maximum and minimum values. The data to be processed can then be normalized by calling a normalizing internal function. When observing the final output, denormalization may also be invoked to restore the data to its original standard.
Step S3, constructing a pedestrian trajectory prediction network, which specifically comprises the following steps:
the pedestrian trajectory prediction network comprises a generator network and a discriminator network, wherein the generator network and the discriminator network form a generation countermeasure network. The generator network further comprises: the device comprises an encoder, a decoder, a social characteristic embedding layer and a social pooling layer, wherein the encoder and the decoder both comprise a full connection layer and an LSTM layer;
specifically, in the generator network, a decoder and a social characteristic embedding layer are arranged in front, the input of the decoder and the output of the social characteristic embedding layer are track data, the output of the decoder and the output of the social characteristic embedding layer are transmitted to the social characteristic embedding layer, the output of the social characteristic embedding layer is combined with noise and input to the decoder, the output of the decoder is transmitted to a discriminator network, wherein in the encoder, a fully connected layer is arranged in front, and then a plurality of LSTM networks are connected, correspondingly, in the decoder, a plurality of LSTM networks are arranged in front, and then a fully connected layer is connected.
Specifically, in the discriminator network, one full-connection layer is disposed in front of the other full-connection layer, and a plurality of LSTM networks are disposed between the two full-connection layers.
The above is the general structure of the pedestrian trajectory prediction network provided in the present embodiment.
More specifically, in this embodiment, a model is built in a Python master function, a first network model, and a coding LSTM layer is composed of a full connection layer and an LSTM layer, the input dimension of the full connection layer is four-dimensional, so that the coordinate and the speed of a character object can be conveniently input, and then the input dimension is converted into the size of a hidden layer and output to the LSTM layer. On the internal function, an LSTM initialization function is required in addition to the front line function to initialize the memory cells and output the initial values. All LSTM layers will then carry such an internal function.
The second network model is a social feature embedding layer, and the embedding layer is generally composed of three fully-connected layers, and the middle of the fully-connected layers is connected by an activation function, so that gradient disappearance or gradient divergence is prevented. Here, the selected activation function is a ReLU () function.
The third network is an attention layer, the layer starts from a full connection layer, then the number of data in the current data block is judged, and the acquired information is manually processed in a matrix multiplication mode to obtain an attention result.
The fourth network is a decoding layer and is used for analyzing the speed predicted value obtained by the network and then conveniently predicting the track by using the speed predicted value.
The fifth network is a decoding LSTM layer, which is not actually used in the code, which is an alternative to the decoding layer in a fully concatenated form, and if this layer is selected, the decoding network is replaced by this layer.
The sixth network is the discriminator network, which is also the most complex as a critical network to complete the encapsulation. The system consists of an LSTM layer, an observation path full-link layer, a prediction path full-link layer, an implicit encoder and a classifier. The generator output, after entering the forward network, passes through the observation LSTM layer first, and then through the observation fully-connected layer. And the input predicted trajectory passes through the predicted fully-connected layer. The two fractions were then connected in series and placed in a classifier. The classifier needs to perform a resolution of the generator data and the prediction data of the validation set. The implicit parameters are propagated back as a numerical output.
More specifically, in the present embodiment, the calculation formula of the discriminator network is as follows:
Figure BDA0003510334370000081
Figure BDA0003510334370000082
Figure BDA0003510334370000083
Figure BDA0003510334370000084
in formula (1) to formula (4), t is 1obs,...,tobs+tprep,TiRepresenting a union of true and false tracks, tobsFor the length of time of the past trace, tprepFor future track time lengths, Xi,YiIs a position coordinate, delta is a full link layer for converting two-dimensional coordinates into a feature vector, WδIs the full link layer parameter. The LSTM layer encodes the feature vector at each time until t-tobs+tprepEnding the moment to obtain a sequence coding vector
Figure BDA0003510334370000091
WdiAre parameters of the LSTM layer. Rho is a multilayer perceptron, WρFor the multi-layer perceptron parameter, a score s for the trajectory is obtainedi
More specifically, in the present embodiment, the calculation formula of the encoder is as follows:
Figure BDA0003510334370000092
Figure BDA0003510334370000093
in formula (5) to formula (6), t is 1obs,tobsFor the length of the time of the past trace,
Figure BDA0003510334370000094
mu is a full connecting layer for converting the two-dimensional coordinate into a characteristic vector for the position of the pedestrian i at the moment t,
Figure BDA0003510334370000095
is a position feature vector, WμeFor the full-concatenation layer parameters, the LSTM layer encodes the eigenvectors at each instant until t ═ tobsThe moment is finished to obtain a sequence context vector
Figure BDA0003510334370000096
WeAre parameters of the LSTM layer.
More specifically, in this embodiment, the calculation formula of the decoder is as follows:
Figure BDA0003510334370000097
Figure BDA0003510334370000098
Figure BDA0003510334370000099
Figure BDA00035103343700000910
Figure BDA00035103343700000911
in formula (7) -formula (11), t ═ tobs+1,...,tprep,ciThe tensor of the pedestrian i obtained by the attention module represents the influence of other pedestrians on the pedestrian i in the scene. Phi is a multi-layer sensor, WφFor the purpose of the multi-layer perceptron parameter,
Figure BDA0003510334370000101
and showing the influence result of other pedestrians on the pedestrian i at the time t. Concatenating the result with noise z to obtain the decoder LSTM initial hidden state
Figure BDA0003510334370000102
Mu is a full connection layer for connecting two-dimensional coordinates of the previous moment
Figure BDA0003510334370000103
Conversion into a feature vector, WμdIs the full link layer parameter. The LSTM layer encodes each time instant and the hidden state of each time instant
Figure BDA0003510334370000104
The predicted coordinate of the t moment can be obtained through the multilayer perceptron gamma
Figure BDA0003510334370000105
WdFor LSTM decoder parameters, WγIs the multilayer perceptron gamma parameter.
And S4, inputting the training set obtained in the step S2 into the pedestrian trajectory prediction network constructed in the step S3, performing model training, and performing multiple rounds of iterative training until the loss function is converged and fixing network parameters.
Specifically, in this embodiment, the pedestrian trajectory prediction network is trained in a conventional GAN training manner, that is, iterative training is performed by using a back propagation algorithm, and more specifically, after the result is output by the discriminator, the discriminator performs back propagation first to change the network parameters of the discriminator. And then the generator is propagated reversely, the part of the discriminator is locked, and the network parameters of the generator are corrected.
And (4) performing multiple rounds of iterative training until the total loss function is converged, and fixing network parameters to obtain a pedestrian trajectory prediction model.
Specifically, in this embodiment, the total loss function is used as follows:
L=LGAN(G,D)+λLL2(G) (12)
in the formula (12), LGAN(G, D) is expressed as the loss of antagonism, LL2(G) Indicated as L2 loss, wherein,
Figure BDA0003510334370000106
Figure BDA0003510334370000107
in the formula (13) and the formula (14), λ is a hyperparameter for balancing the generation of the opposing loss and the L2 loss. k is a hyperparameter representing the number of samples taken to generate a result for the generator. T isiRepresenting a union of true and false tracks, Xi,YiFor the position coordinates, z represents noise and p represents the number of training sessions. L isGANIn fact, it is a cross entropy, and the discriminator should make D (T) as much as possibleI) Close to 1, let D (G (X)iZ)) is close to 0, so the discriminator should maximize LGANAnd the generator, in contrast, minimizes LGAN。LL2For calculating the difference between the predicted value and the true value. For each scene, k tracks are sampled from the generator, and L is selectedL2Minimum trajectory as predicted
Figure BDA0003510334370000111
Since the discriminator loses regression with only the prediction that differs minimally from the true value, LL2The penalty may be such that the model generates as many predicted trajectories as possible that are satisfactory.
Specifically, in the prediction mode, model parameters are firstly imported, and then, the observation data and the prediction data before are combined to be used as original data and transmitted into a prediction code for trajectory prediction. And after the prediction is finished, restoring the result to the original range, and writing the result into a text file for convenient viewing.
Detection indexes are as follows:
average offset error (ADE):
Figure BDA0003510334370000112
final offset error (FDE):
Figure BDA0003510334370000113
in summary, the patent discloses a pedestrian trajectory prediction method based on generation of a countermeasure network and a long-short term memory model. The method mainly aims to build a model consisting of a generator and a discriminator, select an ETH pedestrian trajectory data set as a data source, select a Social GAN mode, and model historical pedestrian trajectories and current positions by means of a long-short term memory network (LSTM) to realize pedestrian trajectory prediction. Wherein the generator performs a feature analysis of the historical trajectory of each pedestrian in the data using a long-short term memory network (LSTM). The discriminator part extracts the input features through a plurality of full connection layers, and also memorizes the features of the history track through the LSTM network. Considering that the official ETH data set does not contain the person ID tag information, the model is successfully trained by using the ETH data set and the supplementary data set thereof, and common track prediction indexes ADE and FDE are selected as performance evaluation indexes. Experimental results show that the model has better performance in pedestrian trajectory prediction.
The invention is not described in detail, but is well known to those skilled in the art.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations can be devised by those skilled in the art in light of the above teachings. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (9)

1. A pedestrian trajectory prediction method based on a generated countermeasure network and a long-short term memory model is characterized by comprising the following steps:
step S1, acquiring an ETH data set and an ETH supplementary data set, and matching the ETH data set and the ETH supplementary data set to acquire a definite track of each character object in the ETH data set, wherein the definite track of each character object is used as a data set for training and testing;
step S2, dividing the training data set obtained in the step S1 into a training set and a test set according to a certain proportion, and carrying out pretreatment;
step S3, constructing a pedestrian trajectory prediction network, which comprises: the pedestrian trajectory prediction network comprises a generator network and a discriminator network, wherein the generator network and the discriminator network form a generation countermeasure network;
step S4, inputting the training set obtained in the step S2 into the pedestrian trajectory prediction network constructed in the step S3, performing model training, performing multiple rounds of iterative training until a loss function is converged, and fixing network parameters;
step S5, performing pedestrian trajectory prediction, including: and inputting the test set obtained in the step S2 into the pedestrian trajectory prediction model obtained in the step S4 for prediction, so as to obtain a prediction result.
2. The method for predicting pedestrian trajectories according to claim 1, wherein the step S1 specifically comprises:
step S101, firstly, an ETH data set is obtained, wherein the ETH data set is a parameter formed by a time tag, a person ID tag and a person position point coordinate (x, y), then idl tag files in the ETH data set are read in a character string mode, then effective information in idl tag files is read through a universal code re module in Python, and finally the effective information is exported in a csv table mode;
step S102, firstly, an ETH supplementary data set is obtained, wherein no additional tag data file exists in the data set, and only all the figure pictures are segmented; then, naming each picture in the ETH supplementary data set by referring to the label information in the ETH data set, and dividing all the pictures into different folders according to the classification of people; finally, acquiring a file directory of each folder through a universal code os module in the Python;
step S103, matching the csv table obtained in the step S101 and each file directory obtained in the step S102 in a data list in Python one by one to obtain an explicit track of each person object in the ETH data set;
and step S104, dividing the track data into observation track data, prediction track data, a time list and an ID list, and constructing a data set for model training and testing.
3. The method for predicting pedestrian trajectories according to claim 2, wherein the step S2 comprises:
step S201, dividing the data set constructed in the step S104 into 4 parts according to the dividing ratio: 1, dividing data into a training set and a test set;
step S202, all data are standardized to be between 0 and 1.
4. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model as claimed in claim 3, wherein the generator network further comprises: the device comprises an encoder, a decoder, a social characteristic embedding layer and a social pooling layer, wherein the encoder and the decoder both comprise a full connection layer and an LSTM layer;
in the generator network, a decoder and a social characteristic embedding layer are arranged in front, the input of the decoder and the output of the social characteristic embedding layer are track data, the output of the decoder and the output of the social characteristic embedding layer are transmitted to the social characteristic embedding layer, the output of the social characteristic embedding layer is combined with noise and input to the decoder, and the output of the decoder is transmitted to a discriminator network, wherein in the encoder, a fully connected layer is arranged in front, and then a plurality of LSTM networks are connected, correspondingly, in the decoder, a plurality of LSTM networks are arranged in front, and then a fully connected layer is connected.
5. The pedestrian trajectory prediction method based on generation of the countermeasure network and the long-short term memory model as claimed in claim 4, wherein a fully-connected layer is arranged in front of another fully-connected layer in the discriminator network, and a plurality of LSTM networks are arranged between the two fully-connected layers.
6. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model according to claim 5, wherein the calculation formula of the discriminator network is as follows:
Figure FDA0003510334360000021
Figure FDA0003510334360000022
Figure FDA0003510334360000023
Figure FDA0003510334360000024
in formula (1) to formula (4), t is 1obs,...,tobs+tprep,TiRepresenting a union of true and false tracks, tobsFor long time of past trackDegree, tprepFor future track time lengths, Xi,YiIs a position coordinate, delta is a full link layer for converting two-dimensional coordinates into a feature vector, WδIs the full link layer parameter; the LSTM layer encodes the feature vector at each time until t ═ tobs+tprepThe time is over to obtain the sequence coding vector
Figure FDA0003510334360000025
WdiIs a parameter of the LSTM layer; ρ is a multilayer sensor, WρFor the multi-layer perceptron parameter, a score s for the trajectory is obtainedi
7. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model according to claim 5, wherein the calculation formula of the encoder is as follows:
Figure FDA0003510334360000031
Figure FDA0003510334360000032
in formula (5) to formula (6), t is 1obs,tobsFor the length of the time of the past trace,
Figure FDA0003510334360000033
mu is a full connecting layer for converting the two-dimensional coordinate into a characteristic vector for the position of the pedestrian i at the moment t,
Figure FDA0003510334360000034
is a position feature vector, WμeFor the full-concatenation layer parameters, the LSTM layer encodes the eigenvectors at each instant until t ═ tobsThe moment is finished to obtain a sequence context vector
Figure FDA0003510334360000035
WeAre parameters of the LSTM layer.
8. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model according to claim 5, wherein the calculation formula of the decoder is as follows:
Figure FDA0003510334360000036
Figure FDA0003510334360000037
Figure FDA0003510334360000038
Figure FDA0003510334360000039
Figure FDA00035103343600000310
in formula (7) -formula (11), t ═ tobs+1,...,tprep,ciThe tensor of the pedestrian i obtained by the attention module represents the influence of other pedestrians on the pedestrian i in the scene; phi is a multi-layer sensor, WφFor the purpose of the multi-layer perceptron parameter,
Figure FDA0003510334360000041
representing the influence result of other pedestrians on the pedestrian i at the time t; concatenating the result with noise z to obtain the decoder LSTM initial hidden state
Figure FDA0003510334360000042
Mu is a full connection layer for connecting two-dimensional coordinates of the previous moment
Figure FDA0003510334360000043
Conversion into a feature vector, WμdIs the full link layer parameter; the LSTM layer encodes each time instant and the hidden state of each time instant
Figure FDA0003510334360000044
The predicted coordinate of the t moment can be obtained through the multilayer perceptron gamma
Figure FDA0003510334360000045
Wd is the LSTM decoder parameter, WγIs the multilayer perceptron gamma parameter.
9. The pedestrian trajectory prediction method based on the generation of the countermeasure network and the long-short term memory model according to claim 5, wherein the total loss function is adopted as:
L=LGAN(G,D)+λLL2(G) (12)
in the formula (12), LGAN(G, D) is expressed as the loss of antagonism, LL2(G) Indicated as L2 loss, wherein,
Figure FDA0003510334360000046
Figure FDA0003510334360000047
in equations (13) and (14), λ is a hyperparameter for balancing the generation of the impedance loss and the L2 loss, k is a hyperparameter representing the number of samples taken of the result generated by the generator, TiRepresenting a union of true and false tracks, Xi,YiAs position coordinates, z-tablesNoise is shown and p represents the number of training sessions.
CN202210149868.2A 2022-02-18 2022-02-18 Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model Pending CN114580715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210149868.2A CN114580715A (en) 2022-02-18 2022-02-18 Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210149868.2A CN114580715A (en) 2022-02-18 2022-02-18 Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model

Publications (1)

Publication Number Publication Date
CN114580715A true CN114580715A (en) 2022-06-03

Family

ID=81773194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210149868.2A Pending CN114580715A (en) 2022-02-18 2022-02-18 Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model

Country Status (1)

Country Link
CN (1) CN114580715A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115456073A (en) * 2022-09-14 2022-12-09 杭州电子科技大学 Generation type confrontation network model modeling analysis method based on long-term and short-term memory

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115456073A (en) * 2022-09-14 2022-12-09 杭州电子科技大学 Generation type confrontation network model modeling analysis method based on long-term and short-term memory

Similar Documents

Publication Publication Date Title
Cai et al. YOLOv4-5D: An effective and efficient object detector for autonomous driving
Li et al. Humanlike driving: Empirical decision-making system for autonomous vehicles
CN106874597B (en) highway overtaking behavior decision method applied to automatic driving vehicle
Ni et al. An improved deep network-based scene classification method for self-driving cars
Ohn-Bar et al. Learning to detect vehicles by clustering appearance patterns
US20190026917A1 (en) Learning geometric differentials for matching 3d models to objects in a 2d image
Yang et al. Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
JP2022537636A (en) Methods, systems and computer program products for media processing and display
CN110991523A (en) Interpretability evaluation method for unmanned vehicle detection algorithm performance
Li et al. The traffic scene understanding and prediction based on image captioning
Abdi et al. In-vehicle augmented reality system to provide driving safety information
CN114030485A (en) Automatic driving automobile man lane change decision planning method considering attachment coefficient
Cai et al. A driving fingerprint map method of driving characteristic representation for driver identification
Oeljeklaus An integrated approach for traffic scene understanding from monocular cameras
Wang et al. FPT: Fine-grained detection of driver distraction based on the feature pyramid vision transformer
CN114580715A (en) Pedestrian trajectory prediction method based on generation of confrontation network and long-short term memory model
Yuan et al. Category-level adversaries for outdoor lidar point clouds cross-domain semantic segmentation
Mirus et al. The importance of balanced data sets: Analyzing a vehicle trajectory prediction model based on neural networks and distributed representations
CN112668662A (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN117217314A (en) Driving situation reasoning method based on metadata driving and causal analysis theory
Foszner et al. Development of a realistic crowd simulation environment for fine-grained validation of people tracking methods
CN115544232A (en) Vehicle-mounted intelligent question answering and information recommending method and device
CN113688864B (en) Human-object interaction relation classification method based on split attention
CN116863260A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination