CN114580718A - Pedestrian trajectory prediction method for generating confrontation network based on condition variation - Google Patents

Pedestrian trajectory prediction method for generating confrontation network based on condition variation Download PDF

Info

Publication number
CN114580718A
CN114580718A CN202210162399.8A CN202210162399A CN114580718A CN 114580718 A CN114580718 A CN 114580718A CN 202210162399 A CN202210162399 A CN 202210162399A CN 114580718 A CN114580718 A CN 114580718A
Authority
CN
China
Prior art keywords
pedestrian
information
track
feature
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210162399.8A
Other languages
Chinese (zh)
Inventor
曾繁虎
杨欣
王翔辰
樊江锋
李恒锐
朱义天
周大可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210162399.8A priority Critical patent/CN114580718A/en
Publication of CN114580718A publication Critical patent/CN114580718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a pedestrian track prediction method for generating an confrontation network based on conditional variation, which comprises the steps of firstly extracting characteristics of environmental information in a semantic map, then extracting the characteristic information from a history track sequence of a predicted pedestrian, fusing a conditional variation self-encoder CVAE and a conditional generation confrontation network CGAN based on the constraint and influence of a scene physical environment on a pedestrian track, designing a conditional variation to generate a confrontation network model Context-CVGN, and realizing pedestrian track prediction; by carrying out training tests in different scenes, the Context-CVGN provided by the invention can integrate the advantages of the Context-CVGN and the CVGN to realize the optimal overall performance. On the whole, the algorithm provided by the invention can generate a high-quality pedestrian future track prediction result meeting the environmental condition constraint under the condition that the environmental semantic information is relatively complex.

Description

Pedestrian trajectory prediction method for generating confrontation network based on condition variation
Technical Field
The invention relates to the technical field of pedestrian trajectory prediction, in particular to a pedestrian trajectory prediction method for generating an confrontation network based on condition variation.
Background
With the rapid development of industries such as automatic driving, camera monitoring and intelligent robots, the pedestrian trajectory prediction problem has attracted various concerns and objections. On one hand, whether the track of the pedestrian can be predicted accurately and quickly in real time or not plays a positive role in prompting and avoiding danger in time, and meanwhile, the method has great and profound significance in deep research in the fields of unmanned vehicles, intelligent traffic and the like; on the other hand, since the pedestrian motion trajectory is influenced and interfered by various factors, how to judge the accuracy and robustness of the pedestrian trajectory prediction is also a part of the key problem. In recent years, many researches and methods for predicting the trajectory of a pedestrian have been proposed, and many results have been obtained to promote the progress and development of the field.
Generally, interaction between each traffic participant (pedestrian, automobile, bicycle, etc.) in a scene is influenced, not only by the historical track information of the participant itself, but also by the most complicated and difficult-to-predict part in the pedestrian track prediction, because the motion track of the pedestrian in a certain scene is influenced not only by the historical track information of the pedestrian but also by the surrounding traffic participants and the environment information to a greater extent, which is called interaction with the pedestrian and the surrounding environment. According to The literature (The walking behaviour of pedestrian social groups and its impact on crowd dynamics. plos one,5(4): e10047,2010) data, 70% of pedestrians tend to walk in groups. It can be seen that the trajectory of the pedestrian is largely influenced by the interaction between the pedestrians. The interaction between pedestrians is mainly driven by common knowledge, habits and social customs (such as obeying traffic rules). Pedestrian trajectory prediction typically involves several interaction scenarios: walking in parallel (opposite/opposite) with other people, walking in a group formation, avoiding collision and the like; in addition to this, another complexity of pedestrian trajectory prediction arises from the randomness of motion, since the pedestrian's destination and desired walking path are not known. Based on the above consideration, the existing trajectory prediction methods not only model a specific participant, but also interactively model the participant in consideration of the influence of prior information and interactive information.
The model established by the trajectory prediction mainly aims at the feature extraction of an interaction model, and the mainstream modeling methods mainly comprise two methods: manual interactive modeling and structural interactive modeling. The prediction Model established by the manual interactive modeling method (hand-manipulated Interaction Model) considers that the basis of the interactive Model is the spatial feature of scene semantics (such as relative Euclidean distance between pedestrians), and based on the basis, the spatial feature is manually constructed based on historical track data, and then the interactive Model is established based on different modes. The literature (Physical Review E: Statistical Physics Plasmas Fluids and Related interrelation Topics,1995,51(5): 4282-. The model has the advantages of simplicity, intuition and low complexity, and the physical significance of the model is very intuitive; however, because the environment semantic information is not added into the modeling process, the situations of social behaviors of pedestrians, such as accompanying behaviors, and the like cannot be described, and the problems of over sensitivity to model parameters exist, weak generalization capability and incapability of obtaining a good effect exist. The Social-LSTM model is proposed in the literature (Human project prediction in crown spaces. CVPR, 2016). The method combines the interactive space characteristics and the time sequence to carry out deep learning to establish an interactive prediction model, Social-LSTM allocates a long-short term memory neural network LSTM network for each pedestrian to predict the future path of the pedestrian, and a Social Pooling module is added in the network to calculate the space characteristics of other surrounding pedestrians to the target pedestrian and splice the space characteristics to the hidden vector of an LSTM decoder, thereby solving the interactive problem to a certain extent and achieving a certain effect.
The structural interactive modeling (Structured Interaction Model) adopts a structural modeling method, and aims to overcome the defects of manually constructing a feature space. Along with the proposal of graph representation and the development of graph neural networks, new ideas of interactive modeling based on the graph neural networks are gradually attracting attention. The difficulty of predicting the pedestrian track is that it is difficult to interactively model the pedestrian with rich and complex social attributes based on simple relative spatial features, and deep level feature information among pedestrians is needed; the graph topological structure of the space-time graph is naturally suitable for representing interaction models among pedestrians, and compared with the manual construction of semantic maps, the graph topological structure is a more intuitive and effective pedestrian interaction modeling mode. The result of this modeling is a Social-STGCNN model proposed in the article (Social-STGCNN: A Social space-Temporal Graph relational Neural Network for Human target Prediction,2020IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA,2020, pp.14412-14420), which describes the pedestrian Trajectory in a space-time diagram and uses a weighted adjacency matrix to characterize the interpersonal interaction, where kernel functions quantitatively describe the interpersonal interaction, achieving an improvement in accuracy and also greatly improving the parameter efficiency. However, the graph neural network also has some problems such as over-smoothness, difficulty in modeling environmental information and the like, which need to be further analyzed and solved in subsequent related researches.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a pedestrian track prediction method for generating an confrontation network based on conditional variation, which solves the problem of track prediction in a multi-participant dynamic interaction scene under the limitation of a physical environment by combining the architectural advantages of a conditional variation autoencoder CVAE and a conditional generation confrontation network CGAN and fusing the architectural advantages, so that a model can generate a high-quality pedestrian future track meeting the environmental condition constraint.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a pedestrian trajectory prediction method for generating a confrontation network based on condition variation comprises the following steps:
step S1, extracting the characteristics of the environmental information in the semantic map;
inputting semantic map information into a convolutional neural network, and outputting high-dimensional environmental feature information; sequentially leveling and adjusting a full connection layer on the environment characteristic information to obtain an environment characteristic vector;
step S2, extracting characteristic information from the history track sequence of the predicted pedestrian;
identifying a given predicted pedestrian observation trajectory; extracting relevant characteristic information of a specific pedestrian on a time axis by adopting an LSTM encoder; for each time t, the encoder first compares the position X of the current pedestriani tConverting the coordinate space into the feature space through the full connection layer to obtain an embedded vector Ei tThen input into LSTM in time sequence for coding, and the process is repeated until the observation sequence T is equal to 1,2, …, TobsThe information at all times is coded, and then the coded information of the given pedestrian track is obtained; after leveling and full connection layer adjustment, outputting historical pedestrian track characteristic vectors;
and S3, based on the constraint and influence of the scene physical environment on the pedestrian track, fusing a conditional variation autoencoder CVAE and a condition to generate an antagonistic network CGAN, designing a conditional variation to generate an antagonistic network model Context-CVGN, and realizing the pedestrian track prediction.
Further, the specific step of acquiring the environment feature vector in step S1 includes:
s1.1, extracting semantic map environment features, specifically expressed as follows:
Vfeature=CNN(Isem)
feature extraction using CNN convolutional neural network, where VfeatureRepresenting semantic maps from input IsemObtaining a feature map in the CNN, wherein feature information in the feature map abstract semantic map is represented by high-dimensional information;
s1.2, sequentially flattening the high-dimensional characteristic information and adjusting a full connection layer to obtain an environment characteristic vector, wherein the environment characteristic vector is specifically expressed as follows:
Vflatten=flatten(Vfeature)
Vcontext=FC(Vflatten)
wherein, flat () represents leveling operation, FC () represents full connection layer adjustment, and output information VcontextThe environment characteristic vector is obtained after being leveled and adjusted by a full connection layer, and the obtained information can be used for continuous training and use of a subsequent network.
Further, pre-trained ResNet18 is adopted as a CNN convolutional neural network; obtaining an environment characteristic vector V with the length of 256 after leveling and full connection layer adjustmentcontext
Further, in step S2, extracting feature information from the predicted pedestrian history track sequence includes:
step S2.1, firstly, carrying out full-connection embedded coding on 2D coordinate information of the pedestrian at each moment, wherein the expression is as follows:
Figure BDA0003515355180000041
wherein WembeddingIn order to fully connect the parameters of the embedded layer,
Figure BDA0003515355180000042
position coordinate information indicating the ith pedestrian at time t, FC indicating the full link layer,
Figure BDA0003515355180000043
a time-coded feature representing a pedestrian;
step S2.2, inputting the time coding features of the pedestrians into the LSTM according to the time sequence, wherein the representation form is as follows:
Figure BDA0003515355180000044
wherein, LSTM represents long-short term memory network,
Figure BDA0003515355180000045
vector feature, W, representing hidden states under a sequence of observation trajectoriesenParameters of an LSTM encoder;
the final output pedestrian history track feature vector is output in a hidden state of the last layer of the LSTM encoder, and is expressed as follows:
Figure BDA0003515355180000046
for output information VtrajectoryAnd (4) acquiring a history pedestrian track characteristic vector with the length of 256 by adopting the leveling and full-connection layer adjusting operation in the step S1.2.
Further, the method for designing a countermeasure network model Context-CVGN according to the condition variation generation in step S3 is as follows:
step S3.1, the environment feature vector V obtained in the step S1contextAnd the historical pedestrian trajectory feature vector V acquired in step S2trajectoryAnd performing feature fusion, and as the inference of the conditional variation network, performing the following expression:
Vvae=concat(Vtrajectory,Vcontext)
wherein concat (-) represents that track characteristic information and semantic characteristic information are spliced;
s3.2, generating a probability distribution hidden variable space by variable deduction, and randomly sampling the probability distribution hidden variable space to serve as a predicted value of the output characteristic, wherein the expression is as follows:
Z=sample(VAE(Vvae))
wherein, VAE (-) is the fitting normal distribution of the variational self-encoder, sample (-) means that random sampling is carried out from the fitted hidden variable feature space, and feature prediction is carried out by outputting a track with a specific distribution probability; z represents characteristic information of a predicted track sampled by a method from the angle of variation inference;
s3.3, respectively calculating and fusing the condition variational self-encoder CVAE and the condition generation countermeasure network CGAN to carry out pedestrian prediction track characteristic output of the full connection layer, and expressing the characteristics as follows:
Xpred-CVAE=FC(Z,Vcontext)
Xpred-CGAN=FC(Vvae,Vcontext)
because of the respective advantages of the two architectures of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN in the aspect of decoding and generating a predicted track, in designing the conditional variation generation countermeasure network model Context-CVGN, the two architectures are fused by using a method of setting a hyper-parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation countermeasure, and the adaptive updating and training are carried out by combining the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN to finish the design of the conditional variation generation countermeasure network model Context-CVGN.
Has the advantages that:
the pedestrian track prediction method for generating the confrontation network based on the conditional variation focuses on how to predict tracks which simultaneously meet rationality of physical environment constraint, social regulations and the like, combines the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation confrontation network CGAN and fuses, so that a model can meet the requirement of generating high-quality pedestrian future tracks meeting the environmental condition constraint. The pedestrian trajectory real-time prediction method has positive help for timely prompting and avoiding danger by predicting the pedestrian trajectory accurately and quickly in real time, and has great theoretical value and profound practical significance for deep research in the fields of automatic driving, camera monitoring, intelligent traffic and the like.
Drawings
FIG. 1 is a general schematic diagram of a conditional variation generation countermeasure network CVGN model framework provided by the present invention;
FIG. 2 is a block diagram of a conditional variational autoencoder CVAE and a conditional generation countermeasure network CGAN in an embodiment of the present invention;
FIG. 3 is a comparison diagram of the flows of two conditional scene representation methods, Context-CVAE, Context-CGAN and fusion framework Context-CVGN in the embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Firstly, a method for predicting future tracks and conditional probability distribution of pedestrians based on given observation video frame images is provided:
for a given series of observed video frame images, its time scale T e {1,2, … …, T ∈ {1,2, … …obsA sequence of pedestrian trajectories can be observed for a given set of scenes
Figure BDA0003515355180000061
Wherein
Figure BDA0003515355180000062
Spatial information for the ith actual traffic participant in the T frame, TthobsThe frame is the current time frame, and i ∈ {1,2, … …, N } represents the number of pedestrians contained in each frame.
Since the predicted trajectory of a pedestrian is related to other interactive pedestrians and scene environments, the conditional probability distribution is expressed as follows:
Figure BDA0003515355180000063
wherein, the current predicted pedestrian track information is used (·)MRepresenting, for interactive pedestrian trajectory information in other scenes
Figure BDA0003515355180000064
And (i ═ 1,2, … …, N), and the environment information is denoted by R.
Based on the known information construction algorithm, through training and optimizing probability distribution under conditions, the future predicted track of the pedestrian can be obtained and expressed as follows:
Figure BDA0003515355180000065
wherein, TpredTo predict trace timing length.
The embodiment provides a pedestrian track prediction method for generating an confrontation network based on conditional variational, which combines the architectural advantages of a conditional variational self-encoder CVAE and a conditional generation confrontation network CGAN and performs fusion, and a design conditional variational generation confrontation network model CVGN performs hidden space distribution fitting and sampling on a characteristic vector and simultaneously fits the network, so that the model can meet the requirement of generating a high-quality future track meeting the environmental condition constraint. The specific frame structure is shown in fig. 1, and the specific design steps are as follows:
and step S1, extracting the characteristics of the environmental information in the semantic map.
Inputting semantic map information into a convolutional neural network, extracting environmental information obtained from the semantic map to obtain expected characteristic information for subsequent model framework prediction, wherein the environmental characteristic extraction process is as follows:
Vfeature=CNN(Isem)
convolutional neural network in which CNN represents. VfeatureRepresenting semantic maps from input IsemAnd a feature map obtained in the CNN, wherein feature information in the feature map abstract semantic map is represented by high-dimensional information.
Since the input of the final variational encoder is the actual information expression with low dimension, and the information obtained by convolution is abstract and high dimension, the feature information with high dimension must be changed into the expected output format through flattening and full connection, and the concrete implementation function is as follows:
Vflatten=flatten(Vfeature)
Vcontext=FC(Vflatten)
wherein, flat () represents leveling operation, FC () represents full connection layer adjustment, and output information VcontextI.e. by flattening and by full connectionAnd obtaining the environment characteristic vector after layer adjustment, wherein the obtained information can be used for continuous training and use of a subsequent network.
By comparison, the pre-trained ResNet18 is selected, and the network has the advantages of simple structure, strong generalization capability and the like and is suitable for processing the environmental information characteristics. In step S1, an RGB semantic map with a resolution of 576 × 480 × 3 is input, the feature map output after multiple convolutions, pooling operations by ResNet18 is flattened to obtain a feature vector with a length of 512, and the feature vector is adjusted to be a feature vector with a length of 256 by using a full connection layer and is called an environmental feature vector as a condition for generating a model under subsequent conditions.
And step S2, extracting characteristic information from the history track sequence of the predicted pedestrian.
Because the historical track of a given pedestrian has strong time correlation, the characteristic extraction of the historical track sequence still considers and selects a classic LSTM encoder to extract the correlation characteristics and information of the specific pedestrian on the time axis.
For each time t, the encoder first compares the position X of the current pedestriani tConverting the coordinate space into the feature space through the full connection layer to obtain an embedded vector
Figure BDA0003515355180000071
Then inputting the sequence into LSTM according to time sequence for encoding, and repeating the process until the observation sequence T is equal to 1,2, …, TobsThe information at all times is coded, and then the coded information of the given pedestrian track is obtained; and after leveling and full connection layer adjustment, outputting historical pedestrian trajectory characteristic vectors. In particular, the amount of the solvent to be used,
step S2.1, vector mapping (full-connection embedding) coding is firstly carried out on the 2D coordinates of the pedestrians at each moment, the specification of the input format of the pedestrians is guaranteed, and meanwhile the usability of the pedestrians is guaranteed to a certain extent, and the expression is as follows:
Figure BDA0003515355180000081
wherein WembeddingIn order to fully connect the parameters of the embedded layer,
Figure BDA0003515355180000082
position coordinate information indicating the ith pedestrian at time t, FC indicating the full link layer,
Figure BDA0003515355180000083
representing the time-coded features of the pedestrian. Besides selecting the 2D coordinates of the pedestrians for coding, fine-grained information such as the body postures of the pedestrians can be selected as a coding object.
Step S2.2, inputting the time coding features of the pedestrians into the LSTM according to the time sequence, wherein the representation form is as follows:
Figure BDA0003515355180000084
wherein, LSTM represents long-short term memory network,
Figure BDA0003515355180000085
vector features representing hidden states under a sequence of observation trajectories, WenParameters of an LSTM encoder; from a hidden state
Figure BDA0003515355180000086
Can be seen from the characteristic representation and the expression thereof, the characteristic is not only in time coding with the state information of the pedestrian at the moment i
Figure BDA0003515355180000087
Relating to, and hiding state information from, historical time series
Figure BDA0003515355180000088
In this regard, feature information extraction for a given pedestrian's historical track sequence is accomplished by such methods.
The final output pedestrian history track feature vector is output in a hidden state of the last layer of the LSTM encoder, and is expressed as follows:
Figure BDA0003515355180000089
for output information VtrajectoryThe leveling and full-link layer adjustment operations as in step S1 are used to obtain a 256-length historical pedestrian trajectory feature vector as a condition for the subsequent condition generation model.
And S3, based on the constraint and influence of the scene physical environment on the pedestrian track, fusing a conditional variation autoencoder CVAE and a condition to generate an antagonistic network CGAN, designing a conditional variation to generate an antagonistic network model Context-CVGN, and realizing the pedestrian track prediction.
The method is different from other existing track prediction models, the constraint and influence of the main scene physical environment on the pedestrian track are taken into consideration, a conditional generation model Context-CVGN is designed for a track prediction task based on the characteristics output by the modules, and a variational self-encoder is used for performing hidden space distribution fitting and sampling on the characteristic vector. In order to verify different generation characteristics of the conditional variant autocoder and the generation countermeasure network in the model framework of the invention, a conditional generation model is designed as a condition, the conditional variant autocoder CVAE and the conditional generation countermeasure network CGAN are included, and a specific schematic diagram of the conditional variant autocoder CVAE and the conditional generation countermeasure network CGAN of the invention is shown in FIG. 2.
The CVAE conditional variation self-encoder can map high-dimensional vectors to low-dimensional feature spaces and reconstruct the low-dimensional feature spaces, and has strong implicit space sampling capacity, and the CGAN conditional generation countermeasure network has stronger fitting capacity due to the generation characteristics of the CGAN conditional variation self-encoder; hidden vectors and scene characteristic vectors of a variational self-encoder are respectively considered under two model frameworks, so that the model can predict more reasonable future motion trail in the semantic map. A comparison diagram of the flows of the two conditional scene representation methods Context-CVAE, Context-CGAN and the fusion framework Context-CVGN in the specific embodiment of the invention is shown in FIG. 3.
The invention sets two prediction model architectures of Context-CVAE and Context-CGAN according to the characteristics of different networks, combines the advantages of the two models to complete the architecture generation of the Context-CVGN and obtains good results. The method comprises the following specific steps:
step S3.1, the environment feature vector V obtained in the step S1contextAnd the historical pedestrian trajectory feature vector V acquired in step S2trajectoryAnd (5) performing feature fusion, wherein as the inference of the conditional variation network, the expression is as follows:
Vvae=concat(Vtrajectory,Vcontext)
wherein concat (-) represents that track characteristic information and semantic characteristic information are spliced;
s3.2, generating a probability distribution hidden variable space by variable deduction, and randomly sampling the probability distribution hidden variable space to serve as a predicted value of the output characteristic, wherein the expression is as follows:
Z=sample(VAE(Vvae))
wherein, VAE (-) is the fitting normal distribution of the variational self-encoder, sample (-) means that random sampling is carried out from the fitted hidden variable feature space, and feature prediction is carried out by outputting a track with a specific distribution probability; z represents characteristic information of a predicted track sampled by a method from the angle of variation inference; through the process, the characteristic information of the sampled predicted track is obtained by adopting a method from the perspective of variation deduction.
S3.3, respectively calculating the fusion condition variational self-encoder CVAE and the condition generation countermeasure network CGAN to carry out pedestrian prediction track characteristic output of a full connection layer,
using sample input Z in conjunction with scene information V for Context-CVAE in decoding to generate predicted tracks and comparing optimizationcontextAs input of the full connection layer, the predicted track X is finally obtainedpredThe expression is as follows:
Xpred-CVAE=FC(Z,Vcontext)
for Context-CGAN, because the Context-CGAN emphasizes the generation of the network and compares and screens the results to obtain the optimal parameter network, the variation input V under the unconditional scene is usedvaeCombining map semantic information V under conditional scenes instead of sampling input ZcontextAs a whole linkInputting the layers to finally obtain the predicted track XpredThe expression is as follows:
Xpred-CGAN=FC(Vvae,Vcontext)
in the expression, FC is a fully-connected layer for fitting the output trajectory, and the probability corresponding to each trajectory in one prediction is also output by the fully-connected layer and is converted into discrete distribution by softmax.
Because the two architectures of the conditional variational self-encoder CVAE and the conditional generation confrontation network CGAN have respective advantages in the aspect of decoding and generating a predicted track, in the design of a conditional variational generation confrontation network model Context-CVGN, the two architectures are fused by using a method of setting a hyper-parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation confrontation, and the future track with high quality and meeting the environmental condition constraint can be generated by performing adaptive updating and training in combination with the architectural advantages of the conditional variational self-encoder CVAE and the conditional generation confrontation network CGAN.
The model evaluation method is similar to a common trajectory prediction evaluation method, and the accuracy of the predicted trajectory is described by selecting an Average Differential Error (ADE) and an end point Differential Error (FDE) as evaluation indexes, wherein the calculation expression is as follows:
Figure BDA0003515355180000101
Figure BDA0003515355180000102
wherein the content of the first and second substances,
Figure BDA0003515355180000103
representing the actual trajectory to be predicted of the predicted pedestrian,
Figure BDA0003515355180000104
representing an output predicted trajectory of the model; for FDE indexOnly the error of the coordinates of the end point of each pedestrian within the scene needs to be averaged, while for the ADE indicator the sum of the coordinate errors of each observation point needs to be averaged. In order to ensure the track diversity and generalization capability during the evaluation of the mainstream track prediction method, noise is often added in the prediction process, then multiple predictions (generally, 20 predictions) are carried out, and then the prediction track closest to the true value is taken to calculate ADE/FDE. In the pedestrian trajectory prediction model framework provided by the invention, the trajectory and scene data of 3.2s in the past 8 frames are used as input, and the future position of 4.8s in the future 12 frames is predicted, and for ETH&UCY data sets, respectively trained on five scenarios (ETH/HOTEL/UNIV/ZARA1/ZARA2), gave the following performance test results as shown in Table 1 below:
TABLE 1 three methods training test evaluation results under each scene
Figure BDA0003515355180000111
As can be seen from table 1, for a scene with relatively few interactive behaviors, the Context-CGAN performance is optimal, which indicates that the conditional generation countermeasure network has stronger fitting ability; correspondingly, the Context-CVAE performance lags behind that which may be the case where the sampling capability of the hidden space is not helpful for less dispersed future trajectory distributions, and the opposite is true for the scene where the stream of people is more intensive and interacts more frequently; finally, Context-CVGN can combine the advantages of both to achieve an optimum in overall performance. On the whole, the algorithm provided by the invention can generate a high-quality pedestrian future trajectory constraint result meeting the environmental condition constraint under the condition that the environmental semantic information is relatively complex.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (5)

1. A pedestrian trajectory prediction method for generating a confrontation network based on condition variation is characterized by comprising the following steps:
step S1, extracting the characteristics of the environmental information in the semantic map;
inputting semantic map information into a convolutional neural network, and outputting high-dimensional environmental feature information; sequentially leveling and adjusting a full connection layer on the environment characteristic information to obtain an environment characteristic vector;
step S2, extracting characteristic information from the history track sequence of the predicted pedestrian;
identifying a given predicted pedestrian observation trajectory; extracting relevant characteristic information of a specific pedestrian on a time axis by adopting an LSTM encoder; for each time t, the encoder first locates the current pedestrian
Figure FDA0003515355170000011
Converting the coordinate space to the characteristic space through the full connection layer to obtain an embedded vector
Figure FDA0003515355170000012
Then inputting the sequence into LSTM according to time sequence for encoding, and repeating the process until the observation sequence T is equal to 1,2, …, TobsThe information at all times is completely coded, and the coded information of the given pedestrian track is obtained; after leveling and full connection layer adjustment, outputting historical pedestrian track characteristic vectors;
and S3, based on the constraint and influence of the scene physical environment on the pedestrian track, fusing a conditional variation autoencoder CVAE and a condition to generate an antagonistic network CGAN, designing a conditional variation to generate an antagonistic network model Context-CVGN, and realizing the pedestrian track prediction.
2. The pedestrian trajectory prediction method for generating an antagonistic network based on condition variation according to claim 1, wherein the specific step of obtaining the environmental feature vector in step S1 includes:
s1.1, extracting semantic map environment features, specifically expressed as follows:
Vfeature=CNN(Isem)
feature extraction using CNN convolutional neural network, where VfeatureRepresenting semantic maps from input IsemObtaining a feature map in the CNN, wherein feature information in the feature map abstract semantic map is represented by high-dimensional information;
s1.2, sequentially flattening and adjusting a full connection layer of the high-dimensional feature information to obtain an environment feature vector, wherein the environment feature vector is specifically represented as follows:
Vflatten=flatten(Vfeature)
Vcontext=FC(Vflatten)
wherein, flat () represents leveling operation, FC () represents full connection layer adjustment, and output information VcontextThe environment characteristic vector is obtained after being leveled and adjusted by a full connection layer, and the obtained information can be used for continuous training and use of a subsequent network.
3. The pedestrian trajectory prediction method for generating the confrontation network based on the conditional variation according to claim 2, characterized in that a pre-trained ResNet18 is adopted as a CNN convolutional neural network; obtaining an environment characteristic vector V with the length of 256 after leveling and full connection layer adjustmentcontext
4. The pedestrian trajectory prediction method for generating the countermeasure network based on the condition variation according to claim 2, wherein the step S2 is to extract feature information from a history trajectory sequence of the predicted pedestrian, and the specific steps include:
step S2.1, firstly, carrying out full-connection embedded coding on 2D coordinate information of the pedestrian at each moment, wherein the expression is as follows:
Figure FDA0003515355170000021
wherein WembeddingIn order to fully connect the parameters of the embedded layer,
Figure FDA0003515355170000022
position coordinate information indicating the ith pedestrian at time t, FC indicating the full link layer,
Figure FDA0003515355170000023
a time-coded feature representing a pedestrian;
step S2.2, inputting the time coding features of the pedestrians into the LSTM according to the time sequence, wherein the representation form is as follows:
Figure FDA0003515355170000024
wherein, LSTM represents long-short term memory network,
Figure FDA0003515355170000025
vector features representing hidden states under a sequence of observation trajectories, WenParameters of an LSTM encoder;
the final output pedestrian history track feature vector is output in a hidden state of the last layer of the LSTM encoder, and is expressed as follows:
Figure FDA0003515355170000026
for output information VtrajectoryAnd (4) acquiring a history pedestrian track characteristic vector with the length of 256 by adopting the leveling and full-connection layer adjusting operation in the step S1.2.
5. The pedestrian trajectory prediction method for generating the countermeasure network based on the conditional variation as recited in claim 4, wherein the conditional variation generation countermeasure network model Context-CVGN design method in step S3 is as follows:
step S3.1, the environment feature vector V obtained in the step S1contextAnd the historical pedestrian trajectory feature vector V acquired in step S2trajectoryAnd performing feature fusion, and as the inference of the conditional variation network, performing the following expression:
Vvae=concat(Vtrajectory,Vcontext)
wherein concat (-) represents that track characteristic information and semantic characteristic information are spliced;
s3.2, generating a probability distribution hidden variable space by variable deduction, and randomly sampling the probability distribution hidden variable space to serve as a predicted value of the output characteristic, wherein the expression is as follows:
Z=sample(VAE(Vvae))
wherein, VAE (-) is the fitting normal distribution of the variational self-encoder, sample (-) means that random sampling is carried out from the fitted hidden variable feature space, and feature prediction is carried out by outputting a track with a specific distribution probability; z represents characteristic information of a predicted track sampled by a method from the angle of variation inference;
s3.3, respectively calculating and fusing the condition variational self-encoder CVAE and the condition generation countermeasure network CGAN to carry out pedestrian prediction track characteristic output of the full connection layer, and expressing the characteristics as follows:
Xpred-CVAE=FC(Z,Vcontext)
Xpred-CGAN=FC(Vvae,Vcontext)
because of the respective advantages of the two architectures of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN in the aspect of decoding and generating a predicted track, in the design of the conditional variation generation countermeasure network model Context-CVGN, the two architectures are fused by using a method of setting a super parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation countermeasures, and the adaptive updating and training are carried out by combining the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN to complete the design of the conditional variation generation countermeasure network model Context-CVGN.
CN202210162399.8A 2022-02-22 2022-02-22 Pedestrian trajectory prediction method for generating confrontation network based on condition variation Pending CN114580718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210162399.8A CN114580718A (en) 2022-02-22 2022-02-22 Pedestrian trajectory prediction method for generating confrontation network based on condition variation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210162399.8A CN114580718A (en) 2022-02-22 2022-02-22 Pedestrian trajectory prediction method for generating confrontation network based on condition variation

Publications (1)

Publication Number Publication Date
CN114580718A true CN114580718A (en) 2022-06-03

Family

ID=81774544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210162399.8A Pending CN114580718A (en) 2022-02-22 2022-02-22 Pedestrian trajectory prediction method for generating confrontation network based on condition variation

Country Status (1)

Country Link
CN (1) CN114580718A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170607A (en) * 2022-06-17 2022-10-11 中国科学院自动化研究所 Travel track generation method and device, electronic equipment and storage medium
CN115309164A (en) * 2022-08-26 2022-11-08 苏州大学 Man-machine co-fusion mobile robot path planning method based on generation of countermeasure network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190094867A1 (en) * 2017-09-28 2019-03-28 Nec Laboratories America, Inc. Generative adversarial inverse trajectory optimization for probabilistic vehicle forecasting
CN113269114A (en) * 2021-06-04 2021-08-17 北京易航远智科技有限公司 Pedestrian trajectory prediction method based on multiple hidden variable predictors and key points
CN113269115A (en) * 2021-06-04 2021-08-17 北京易航远智科技有限公司 Pedestrian trajectory prediction method based on Informer
CN113888601A (en) * 2021-10-26 2022-01-04 北京易航远智科技有限公司 Target trajectory prediction method, electronic device, and storage medium
CN114022847A (en) * 2021-11-23 2022-02-08 清华大学 Intelligent agent trajectory prediction method, system, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190094867A1 (en) * 2017-09-28 2019-03-28 Nec Laboratories America, Inc. Generative adversarial inverse trajectory optimization for probabilistic vehicle forecasting
CN113269114A (en) * 2021-06-04 2021-08-17 北京易航远智科技有限公司 Pedestrian trajectory prediction method based on multiple hidden variable predictors and key points
CN113269115A (en) * 2021-06-04 2021-08-17 北京易航远智科技有限公司 Pedestrian trajectory prediction method based on Informer
CN113888601A (en) * 2021-10-26 2022-01-04 北京易航远智科技有限公司 Target trajectory prediction method, electronic device, and storage medium
CN114022847A (en) * 2021-11-23 2022-02-08 清华大学 Intelligent agent trajectory prediction method, system, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIN YANG 等: "Context CVGN: A conditional multimodal trajectory prediction network based on scene semantic modeling", 《INFORMATION SCIENCES》, 5 March 2024 (2024-03-05), pages 1 - 15 *
汪世玉: "基于交互建模的障碍物轨迹预测技术研究", 《万方学位论文数据库》, 1 November 2023 (2023-11-01), pages 1 - 90 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170607A (en) * 2022-06-17 2022-10-11 中国科学院自动化研究所 Travel track generation method and device, electronic equipment and storage medium
CN115309164A (en) * 2022-08-26 2022-11-08 苏州大学 Man-machine co-fusion mobile robot path planning method based on generation of countermeasure network
CN115309164B (en) * 2022-08-26 2023-06-27 苏州大学 Man-machine co-fusion mobile robot path planning method based on generation of countermeasure network

Similar Documents

Publication Publication Date Title
CN110309732B (en) Behavior identification method based on skeleton video
CN110781838B (en) Multi-mode track prediction method for pedestrians in complex scene
CN111079561B (en) Robot intelligent grabbing method based on virtual training
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN108681774B (en) Human body target tracking method based on generation of confrontation network negative sample enhancement
Cai et al. Environment-attention network for vehicle trajectory prediction
CN108304795A (en) Human skeleton Activity recognition method and device based on deeply study
CN114580718A (en) Pedestrian trajectory prediction method for generating confrontation network based on condition variation
Zhang et al. Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories
CN106814737B (en) A kind of SLAM methods based on rodent models and RTAB Map closed loop detection algorithms
CN113128424B (en) Method for identifying action of graph convolution neural network based on attention mechanism
CN114802296A (en) Vehicle track prediction method based on dynamic interaction graph convolution
CN110413838A (en) A kind of unsupervised video frequency abstract model and its method for building up
CN108920805B (en) Driver behavior modeling system with state feature extraction function
CN114997067A (en) Trajectory prediction method based on space-time diagram and space-domain aggregation Transformer network
CN111462192A (en) Space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for sidewalk sweeping robot
CN114663496A (en) Monocular vision odometer method based on Kalman pose estimation network
CN111027627A (en) Vibration information terrain classification and identification method based on multilayer perceptron
CN113379771A (en) Hierarchical human body analytic semantic segmentation method with edge constraint
CN114445465A (en) Track prediction method based on fusion inverse reinforcement learning
CN114626598A (en) Multi-modal trajectory prediction method based on semantic environment modeling
CN108891421A (en) A method of building driving strategy
CN117237994B (en) Method, device and system for counting personnel and detecting behaviors in oil and gas operation area
Tanke et al. Intention-based long-term human motion anticipation
CN108944940B (en) Driver behavior modeling method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination