CN114580718A

CN114580718A - Pedestrian trajectory prediction method for generating confrontation network based on condition variation

Info

Publication number: CN114580718A
Application number: CN202210162399.8A
Authority: CN
Inventors: 曾繁虎; 杨欣; 王翔辰; 樊江锋; 李恒锐; 朱义天; 周大可
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-06-03

Abstract

The invention discloses a pedestrian track prediction method for generating an confrontation network based on conditional variation, which comprises the steps of firstly extracting characteristics of environmental information in a semantic map, then extracting the characteristic information from a history track sequence of a predicted pedestrian, fusing a conditional variation self-encoder CVAE and a conditional generation confrontation network CGAN based on the constraint and influence of a scene physical environment on a pedestrian track, designing a conditional variation to generate a confrontation network model Context-CVGN, and realizing pedestrian track prediction; by carrying out training tests in different scenes, the Context-CVGN provided by the invention can integrate the advantages of the Context-CVGN and the CVGN to realize the optimal overall performance. On the whole, the algorithm provided by the invention can generate a high-quality pedestrian future track prediction result meeting the environmental condition constraint under the condition that the environmental semantic information is relatively complex.

Description

Pedestrian trajectory prediction method for generating confrontation network based on condition variation

Technical Field

The invention relates to the technical field of pedestrian trajectory prediction, in particular to a pedestrian trajectory prediction method for generating an confrontation network based on condition variation.

Background

With the rapid development of industries such as automatic driving, camera monitoring and intelligent robots, the pedestrian trajectory prediction problem has attracted various concerns and objections. On one hand, whether the track of the pedestrian can be predicted accurately and quickly in real time or not plays a positive role in prompting and avoiding danger in time, and meanwhile, the method has great and profound significance in deep research in the fields of unmanned vehicles, intelligent traffic and the like; on the other hand, since the pedestrian motion trajectory is influenced and interfered by various factors, how to judge the accuracy and robustness of the pedestrian trajectory prediction is also a part of the key problem. In recent years, many researches and methods for predicting the trajectory of a pedestrian have been proposed, and many results have been obtained to promote the progress and development of the field.

Generally, interaction between each traffic participant (pedestrian, automobile, bicycle, etc.) in a scene is influenced, not only by the historical track information of the participant itself, but also by the most complicated and difficult-to-predict part in the pedestrian track prediction, because the motion track of the pedestrian in a certain scene is influenced not only by the historical track information of the pedestrian but also by the surrounding traffic participants and the environment information to a greater extent, which is called interaction with the pedestrian and the surrounding environment. According to The literature (The walking behaviour of pedestrian social groups and its impact on crowd dynamics. plos one,5(4): e10047,2010) data, 70% of pedestrians tend to walk in groups. It can be seen that the trajectory of the pedestrian is largely influenced by the interaction between the pedestrians. The interaction between pedestrians is mainly driven by common knowledge, habits and social customs (such as obeying traffic rules). Pedestrian trajectory prediction typically involves several interaction scenarios: walking in parallel (opposite/opposite) with other people, walking in a group formation, avoiding collision and the like; in addition to this, another complexity of pedestrian trajectory prediction arises from the randomness of motion, since the pedestrian's destination and desired walking path are not known. Based on the above consideration, the existing trajectory prediction methods not only model a specific participant, but also interactively model the participant in consideration of the influence of prior information and interactive information.

The model established by the trajectory prediction mainly aims at the feature extraction of an interaction model, and the mainstream modeling methods mainly comprise two methods: manual interactive modeling and structural interactive modeling. The prediction Model established by the manual interactive modeling method (hand-manipulated Interaction Model) considers that the basis of the interactive Model is the spatial feature of scene semantics (such as relative Euclidean distance between pedestrians), and based on the basis, the spatial feature is manually constructed based on historical track data, and then the interactive Model is established based on different modes. The literature (Physical Review E: Statistical Physics Plasmas Fluids and Related interrelation Topics,1995,51(5): 4282-. The model has the advantages of simplicity, intuition and low complexity, and the physical significance of the model is very intuitive; however, because the environment semantic information is not added into the modeling process, the situations of social behaviors of pedestrians, such as accompanying behaviors, and the like cannot be described, and the problems of over sensitivity to model parameters exist, weak generalization capability and incapability of obtaining a good effect exist. The Social-LSTM model is proposed in the literature (Human project prediction in crown spaces. CVPR, 2016). The method combines the interactive space characteristics and the time sequence to carry out deep learning to establish an interactive prediction model, Social-LSTM allocates a long-short term memory neural network LSTM network for each pedestrian to predict the future path of the pedestrian, and a Social Pooling module is added in the network to calculate the space characteristics of other surrounding pedestrians to the target pedestrian and splice the space characteristics to the hidden vector of an LSTM decoder, thereby solving the interactive problem to a certain extent and achieving a certain effect.

The structural interactive modeling (Structured Interaction Model) adopts a structural modeling method, and aims to overcome the defects of manually constructing a feature space. Along with the proposal of graph representation and the development of graph neural networks, new ideas of interactive modeling based on the graph neural networks are gradually attracting attention. The difficulty of predicting the pedestrian track is that it is difficult to interactively model the pedestrian with rich and complex social attributes based on simple relative spatial features, and deep level feature information among pedestrians is needed; the graph topological structure of the space-time graph is naturally suitable for representing interaction models among pedestrians, and compared with the manual construction of semantic maps, the graph topological structure is a more intuitive and effective pedestrian interaction modeling mode. The result of this modeling is a Social-STGCNN model proposed in the article (Social-STGCNN: A Social space-Temporal Graph relational Neural Network for Human target Prediction,2020IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA,2020, pp.14412-14420), which describes the pedestrian Trajectory in a space-time diagram and uses a weighted adjacency matrix to characterize the interpersonal interaction, where kernel functions quantitatively describe the interpersonal interaction, achieving an improvement in accuracy and also greatly improving the parameter efficiency. However, the graph neural network also has some problems such as over-smoothness, difficulty in modeling environmental information and the like, which need to be further analyzed and solved in subsequent related researches.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a pedestrian track prediction method for generating an confrontation network based on conditional variation, which solves the problem of track prediction in a multi-participant dynamic interaction scene under the limitation of a physical environment by combining the architectural advantages of a conditional variation autoencoder CVAE and a conditional generation confrontation network CGAN and fusing the architectural advantages, so that a model can generate a high-quality pedestrian future track meeting the environmental condition constraint.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a pedestrian trajectory prediction method for generating a confrontation network based on condition variation comprises the following steps:

step S1, extracting the characteristics of the environmental information in the semantic map;

inputting semantic map information into a convolutional neural network, and outputting high-dimensional environmental feature information; sequentially leveling and adjusting a full connection layer on the environment characteristic information to obtain an environment characteristic vector;

step S2, extracting characteristic information from the history track sequence of the predicted pedestrian;

identifying a given predicted pedestrian observation trajectory; extracting relevant characteristic information of a specific pedestrian on a time axis by adopting an LSTM encoder; for each time t, the encoder first compares the position X of the current pedestrian_i ^tConverting the coordinate space into the feature space through the full connection layer to obtain an embedded vector E_i ^tThen input into LSTM in time sequence for coding, and the process is repeated until the observation sequence T is equal to 1,2, …, T_obsThe information at all times is coded, and then the coded information of the given pedestrian track is obtained; after leveling and full connection layer adjustment, outputting historical pedestrian track characteristic vectors;

and S3, based on the constraint and influence of the scene physical environment on the pedestrian track, fusing a conditional variation autoencoder CVAE and a condition to generate an antagonistic network CGAN, designing a conditional variation to generate an antagonistic network model Context-CVGN, and realizing the pedestrian track prediction.

Further, the specific step of acquiring the environment feature vector in step S1 includes:

s1.1, extracting semantic map environment features, specifically expressed as follows:

V_feature＝CNN(I_sem)

feature extraction using CNN convolutional neural network, where V_featureRepresenting semantic maps from input I_semObtaining a feature map in the CNN, wherein feature information in the feature map abstract semantic map is represented by high-dimensional information;

s1.2, sequentially flattening the high-dimensional characteristic information and adjusting a full connection layer to obtain an environment characteristic vector, wherein the environment characteristic vector is specifically expressed as follows:

V_flatten＝flatten(V_feature)

V_context＝FC(V_flatten)

wherein, flat () represents leveling operation, FC () represents full connection layer adjustment, and output information V_contextThe environment characteristic vector is obtained after being leveled and adjusted by a full connection layer, and the obtained information can be used for continuous training and use of a subsequent network.

Further, pre-trained ResNet18 is adopted as a CNN convolutional neural network; obtaining an environment characteristic vector V with the length of 256 after leveling and full connection layer adjustment_context。

Further, in step S2, extracting feature information from the predicted pedestrian history track sequence includes:

step S2.1, firstly, carrying out full-connection embedded coding on 2D coordinate information of the pedestrian at each moment, wherein the expression is as follows:

wherein W_embeddingIn order to fully connect the parameters of the embedded layer,

position coordinate information indicating the ith pedestrian at time t, FC indicating the full link layer,

a time-coded feature representing a pedestrian;

step S2.2, inputting the time coding features of the pedestrians into the LSTM according to the time sequence, wherein the representation form is as follows:

wherein, LSTM represents long-short term memory network,

vector feature, W, representing hidden states under a sequence of observation trajectories_enParameters of an LSTM encoder;

the final output pedestrian history track feature vector is output in a hidden state of the last layer of the LSTM encoder, and is expressed as follows:

for output information V_trajectoryAnd (4) acquiring a history pedestrian track characteristic vector with the length of 256 by adopting the leveling and full-connection layer adjusting operation in the step S1.2.

Further, the method for designing a countermeasure network model Context-CVGN according to the condition variation generation in step S3 is as follows:

step S3.1, the environment feature vector V obtained in the step S1_contextAnd the historical pedestrian trajectory feature vector V acquired in step S2_trajectoryAnd performing feature fusion, and as the inference of the conditional variation network, performing the following expression:

V_vae＝concat(V_trajectory,V_context)

wherein concat (-) represents that track characteristic information and semantic characteristic information are spliced;

s3.2, generating a probability distribution hidden variable space by variable deduction, and randomly sampling the probability distribution hidden variable space to serve as a predicted value of the output characteristic, wherein the expression is as follows:

Z＝sample(VAE(V_vae))

wherein, VAE (-) is the fitting normal distribution of the variational self-encoder, sample (-) means that random sampling is carried out from the fitted hidden variable feature space, and feature prediction is carried out by outputting a track with a specific distribution probability; z represents characteristic information of a predicted track sampled by a method from the angle of variation inference;

s3.3, respectively calculating and fusing the condition variational self-encoder CVAE and the condition generation countermeasure network CGAN to carry out pedestrian prediction track characteristic output of the full connection layer, and expressing the characteristics as follows:

X_pred-CVAE＝FC(Z,V_context)

X_pred-CGAN＝FC(V_vae,V_context)

because of the respective advantages of the two architectures of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN in the aspect of decoding and generating a predicted track, in designing the conditional variation generation countermeasure network model Context-CVGN, the two architectures are fused by using a method of setting a hyper-parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation countermeasure, and the adaptive updating and training are carried out by combining the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN to finish the design of the conditional variation generation countermeasure network model Context-CVGN.

Has the advantages that:

the pedestrian track prediction method for generating the confrontation network based on the conditional variation focuses on how to predict tracks which simultaneously meet rationality of physical environment constraint, social regulations and the like, combines the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation confrontation network CGAN and fuses, so that a model can meet the requirement of generating high-quality pedestrian future tracks meeting the environmental condition constraint. The pedestrian trajectory real-time prediction method has positive help for timely prompting and avoiding danger by predicting the pedestrian trajectory accurately and quickly in real time, and has great theoretical value and profound practical significance for deep research in the fields of automatic driving, camera monitoring, intelligent traffic and the like.

Drawings

FIG. 1 is a general schematic diagram of a conditional variation generation countermeasure network CVGN model framework provided by the present invention;

FIG. 2 is a block diagram of a conditional variational autoencoder CVAE and a conditional generation countermeasure network CGAN in an embodiment of the present invention;

FIG. 3 is a comparison diagram of the flows of two conditional scene representation methods, Context-CVAE, Context-CGAN and fusion framework Context-CVGN in the embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Firstly, a method for predicting future tracks and conditional probability distribution of pedestrians based on given observation video frame images is provided:

for a given series of observed video frame images, its time scale T e {1,2, … …, T ∈ {1,2, … …_obsA sequence of pedestrian trajectories can be observed for a given set of scenes

Wherein

Spatial information for the ith actual traffic participant in the T frame, Tth_obsThe frame is the current time frame, and i ∈ {1,2, … …, N } represents the number of pedestrians contained in each frame.

Since the predicted trajectory of a pedestrian is related to other interactive pedestrians and scene environments, the conditional probability distribution is expressed as follows:

wherein, the current predicted pedestrian track information is used (·)_MRepresenting, for interactive pedestrian trajectory information in other scenes

And (i ═ 1,2, … …, N), and the environment information is denoted by R.

Based on the known information construction algorithm, through training and optimizing probability distribution under conditions, the future predicted track of the pedestrian can be obtained and expressed as follows:

wherein, T_predTo predict trace timing length.

The embodiment provides a pedestrian track prediction method for generating an confrontation network based on conditional variational, which combines the architectural advantages of a conditional variational self-encoder CVAE and a conditional generation confrontation network CGAN and performs fusion, and a design conditional variational generation confrontation network model CVGN performs hidden space distribution fitting and sampling on a characteristic vector and simultaneously fits the network, so that the model can meet the requirement of generating a high-quality future track meeting the environmental condition constraint. The specific frame structure is shown in fig. 1, and the specific design steps are as follows:

and step S1, extracting the characteristics of the environmental information in the semantic map.

Inputting semantic map information into a convolutional neural network, extracting environmental information obtained from the semantic map to obtain expected characteristic information for subsequent model framework prediction, wherein the environmental characteristic extraction process is as follows:

V_feature＝CNN(I_sem)

convolutional neural network in which CNN represents. V_featureRepresenting semantic maps from input I_semAnd a feature map obtained in the CNN, wherein feature information in the feature map abstract semantic map is represented by high-dimensional information.

Since the input of the final variational encoder is the actual information expression with low dimension, and the information obtained by convolution is abstract and high dimension, the feature information with high dimension must be changed into the expected output format through flattening and full connection, and the concrete implementation function is as follows:

V_flatten＝flatten(V_feature)

V_context＝FC(V_flatten)

wherein, flat () represents leveling operation, FC () represents full connection layer adjustment, and output information V_contextI.e. by flattening and by full connectionAnd obtaining the environment characteristic vector after layer adjustment, wherein the obtained information can be used for continuous training and use of a subsequent network.

By comparison, the pre-trained ResNet18 is selected, and the network has the advantages of simple structure, strong generalization capability and the like and is suitable for processing the environmental information characteristics. In step S1, an RGB semantic map with a resolution of 576 × 480 × 3 is input, the feature map output after multiple convolutions, pooling operations by ResNet18 is flattened to obtain a feature vector with a length of 512, and the feature vector is adjusted to be a feature vector with a length of 256 by using a full connection layer and is called an environmental feature vector as a condition for generating a model under subsequent conditions.

And step S2, extracting characteristic information from the history track sequence of the predicted pedestrian.

Because the historical track of a given pedestrian has strong time correlation, the characteristic extraction of the historical track sequence still considers and selects a classic LSTM encoder to extract the correlation characteristics and information of the specific pedestrian on the time axis.

For each time t, the encoder first compares the position X of the current pedestrian_i ^tConverting the coordinate space into the feature space through the full connection layer to obtain an embedded vector

Then inputting the sequence into LSTM according to time sequence for encoding, and repeating the process until the observation sequence T is equal to 1,2, …, T_obsThe information at all times is coded, and then the coded information of the given pedestrian track is obtained; and after leveling and full connection layer adjustment, outputting historical pedestrian trajectory characteristic vectors. In particular, the amount of the solvent to be used,

step S2.1, vector mapping (full-connection embedding) coding is firstly carried out on the 2D coordinates of the pedestrians at each moment, the specification of the input format of the pedestrians is guaranteed, and meanwhile the usability of the pedestrians is guaranteed to a certain extent, and the expression is as follows:

representing the time-coded features of the pedestrian. Besides selecting the 2D coordinates of the pedestrians for coding, fine-grained information such as the body postures of the pedestrians can be selected as a coding object.

wherein, LSTM represents long-short term memory network,

vector features representing hidden states under a sequence of observation trajectories, W_enParameters of an LSTM encoder; from a hidden state

Can be seen from the characteristic representation and the expression thereof, the characteristic is not only in time coding with the state information of the pedestrian at the moment i

Relating to, and hiding state information from, historical time series

In this regard, feature information extraction for a given pedestrian's historical track sequence is accomplished by such methods.

for output information V_trajectoryThe leveling and full-link layer adjustment operations as in step S1 are used to obtain a 256-length historical pedestrian trajectory feature vector as a condition for the subsequent condition generation model.

The method is different from other existing track prediction models, the constraint and influence of the main scene physical environment on the pedestrian track are taken into consideration, a conditional generation model Context-CVGN is designed for a track prediction task based on the characteristics output by the modules, and a variational self-encoder is used for performing hidden space distribution fitting and sampling on the characteristic vector. In order to verify different generation characteristics of the conditional variant autocoder and the generation countermeasure network in the model framework of the invention, a conditional generation model is designed as a condition, the conditional variant autocoder CVAE and the conditional generation countermeasure network CGAN are included, and a specific schematic diagram of the conditional variant autocoder CVAE and the conditional generation countermeasure network CGAN of the invention is shown in FIG. 2.

The CVAE conditional variation self-encoder can map high-dimensional vectors to low-dimensional feature spaces and reconstruct the low-dimensional feature spaces, and has strong implicit space sampling capacity, and the CGAN conditional generation countermeasure network has stronger fitting capacity due to the generation characteristics of the CGAN conditional variation self-encoder; hidden vectors and scene characteristic vectors of a variational self-encoder are respectively considered under two model frameworks, so that the model can predict more reasonable future motion trail in the semantic map. A comparison diagram of the flows of the two conditional scene representation methods Context-CVAE, Context-CGAN and the fusion framework Context-CVGN in the specific embodiment of the invention is shown in FIG. 3.

The invention sets two prediction model architectures of Context-CVAE and Context-CGAN according to the characteristics of different networks, combines the advantages of the two models to complete the architecture generation of the Context-CVGN and obtains good results. The method comprises the following specific steps:

step S3.1, the environment feature vector V obtained in the step S1_contextAnd the historical pedestrian trajectory feature vector V acquired in step S2_trajectoryAnd (5) performing feature fusion, wherein as the inference of the conditional variation network, the expression is as follows:

V_vae＝concat(V_trajectory,V_context)

Z＝sample(VAE(V_vae))

wherein, VAE (-) is the fitting normal distribution of the variational self-encoder, sample (-) means that random sampling is carried out from the fitted hidden variable feature space, and feature prediction is carried out by outputting a track with a specific distribution probability; z represents characteristic information of a predicted track sampled by a method from the angle of variation inference; through the process, the characteristic information of the sampled predicted track is obtained by adopting a method from the perspective of variation deduction.

S3.3, respectively calculating the fusion condition variational self-encoder CVAE and the condition generation countermeasure network CGAN to carry out pedestrian prediction track characteristic output of a full connection layer,

using sample input Z in conjunction with scene information V for Context-CVAE in decoding to generate predicted tracks and comparing optimization_contextAs input of the full connection layer, the predicted track X is finally obtained_predThe expression is as follows:

X_pred-CVAE＝FC(Z,V_context)

for Context-CGAN, because the Context-CGAN emphasizes the generation of the network and compares and screens the results to obtain the optimal parameter network, the variation input V under the unconditional scene is used_vaeCombining map semantic information V under conditional scenes instead of sampling input Z_contextAs a whole linkInputting the layers to finally obtain the predicted track X_predThe expression is as follows:

X_pred-CGAN＝FC(V_vae,V_context)

in the expression, FC is a fully-connected layer for fitting the output trajectory, and the probability corresponding to each trajectory in one prediction is also output by the fully-connected layer and is converted into discrete distribution by softmax.

Because the two architectures of the conditional variational self-encoder CVAE and the conditional generation confrontation network CGAN have respective advantages in the aspect of decoding and generating a predicted track, in the design of a conditional variational generation confrontation network model Context-CVGN, the two architectures are fused by using a method of setting a hyper-parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation confrontation, and the future track with high quality and meeting the environmental condition constraint can be generated by performing adaptive updating and training in combination with the architectural advantages of the conditional variational self-encoder CVAE and the conditional generation confrontation network CGAN.

The model evaluation method is similar to a common trajectory prediction evaluation method, and the accuracy of the predicted trajectory is described by selecting an Average Differential Error (ADE) and an end point Differential Error (FDE) as evaluation indexes, wherein the calculation expression is as follows:

wherein the content of the first and second substances,

representing the actual trajectory to be predicted of the predicted pedestrian,

representing an output predicted trajectory of the model; for FDE indexOnly the error of the coordinates of the end point of each pedestrian within the scene needs to be averaged, while for the ADE indicator the sum of the coordinate errors of each observation point needs to be averaged. In order to ensure the track diversity and generalization capability during the evaluation of the mainstream track prediction method, noise is often added in the prediction process, then multiple predictions (generally, 20 predictions) are carried out, and then the prediction track closest to the true value is taken to calculate ADE/FDE. In the pedestrian trajectory prediction model framework provided by the invention, the trajectory and scene data of 3.2s in the past 8 frames are used as input, and the future position of 4.8s in the future 12 frames is predicted, and for ETH&UCY data sets, respectively trained on five scenarios (ETH/HOTEL/UNIV/ZARA1/ZARA2), gave the following performance test results as shown in Table 1 below:

TABLE 1 three methods training test evaluation results under each scene

As can be seen from table 1, for a scene with relatively few interactive behaviors, the Context-CGAN performance is optimal, which indicates that the conditional generation countermeasure network has stronger fitting ability; correspondingly, the Context-CVAE performance lags behind that which may be the case where the sampling capability of the hidden space is not helpful for less dispersed future trajectory distributions, and the opposite is true for the scene where the stream of people is more intensive and interacts more frequently; finally, Context-CVGN can combine the advantages of both to achieve an optimum in overall performance. On the whole, the algorithm provided by the invention can generate a high-quality pedestrian future trajectory constraint result meeting the environmental condition constraint under the condition that the environmental semantic information is relatively complex.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A pedestrian trajectory prediction method for generating a confrontation network based on condition variation is characterized by comprising the following steps:

identifying a given predicted pedestrian observation trajectory; extracting relevant characteristic information of a specific pedestrian on a time axis by adopting an LSTM encoder; for each time t, the encoder first locates the current pedestrian

Converting the coordinate space to the characteristic space through the full connection layer to obtain an embedded vector

Then inputting the sequence into LSTM according to time sequence for encoding, and repeating the process until the observation sequence T is equal to 1,2, …, T_obsThe information at all times is completely coded, and the coded information of the given pedestrian track is obtained; after leveling and full connection layer adjustment, outputting historical pedestrian track characteristic vectors;

2. The pedestrian trajectory prediction method for generating an antagonistic network based on condition variation according to claim 1, wherein the specific step of obtaining the environmental feature vector in step S1 includes:

V_feature＝CNN(I_sem)

s1.2, sequentially flattening and adjusting a full connection layer of the high-dimensional feature information to obtain an environment feature vector, wherein the environment feature vector is specifically represented as follows:

V_flatten＝flatten(V_feature)

V_context＝FC(V_flatten)

3. The pedestrian trajectory prediction method for generating the confrontation network based on the conditional variation according to claim 2, characterized in that a pre-trained ResNet18 is adopted as a CNN convolutional neural network; obtaining an environment characteristic vector V with the length of 256 after leveling and full connection layer adjustment_context。

4. The pedestrian trajectory prediction method for generating the countermeasure network based on the condition variation according to claim 2, wherein the step S2 is to extract feature information from a history trajectory sequence of the predicted pedestrian, and the specific steps include:

a time-coded feature representing a pedestrian;

wherein, LSTM represents long-short term memory network,

vector features representing hidden states under a sequence of observation trajectories, W_enParameters of an LSTM encoder;

5. The pedestrian trajectory prediction method for generating the countermeasure network based on the conditional variation as recited in claim 4, wherein the conditional variation generation countermeasure network model Context-CVGN design method in step S3 is as follows:

V_vae＝concat(V_trajectory,V_context)

Z＝sample(VAE(V_vae))

X_pred-CVAE＝FC(Z,V_context)

X_pred-CGAN＝FC(V_vae,V_context)

because of the respective advantages of the two architectures of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN in the aspect of decoding and generating a predicted track, in the design of the conditional variation generation countermeasure network model Context-CVGN, the two architectures are fused by using a method of setting a super parameter as a weight coefficient of each model, so that the model can adjust the proportion for track prediction obtained from sampling inference and generation countermeasures, and the adaptive updating and training are carried out by combining the architectural advantages of the conditional variation self-encoder CVAE and the conditional generation countermeasure network CGAN to complete the design of the conditional variation generation countermeasure network model Context-CVGN.