CN114155270A - Pedestrian trajectory prediction method, device, equipment and storage medium - Google Patents

Pedestrian trajectory prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN114155270A
CN114155270A CN202111324745.XA CN202111324745A CN114155270A CN 114155270 A CN114155270 A CN 114155270A CN 202111324745 A CN202111324745 A CN 202111324745A CN 114155270 A CN114155270 A CN 114155270A
Authority
CN
China
Prior art keywords
pedestrian
interaction type
latent variable
feature vector
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111324745.XA
Other languages
Chinese (zh)
Inventor
余剑峤
高嘉时
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202111324745.XA priority Critical patent/CN114155270A/en
Publication of CN114155270A publication Critical patent/CN114155270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Abstract

The embodiment of the invention provides a pedestrian trajectory prediction method, a device, equipment and a storage medium, and relates to the field of data processing, wherein the method comprises the following steps: acquiring pedestrian historical track data, inputting the pedestrian historical track data into a pre-trained feature vector coding model to obtain a pedestrian track feature vector, inputting the pedestrian track feature vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and inputting the interaction type latent variable information and the pedestrian track feature vector into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result. According to the embodiment, the problem of undifferentiated interaction among pedestrians in the related technology is solved, the track prediction accuracy is improved, and the pedestrian track prediction probability distribution result is obtained by using the multi-mode behavior prediction model, so that reasonable tracks of multiple pedestrians can be obtained through prediction, and the universality of track prediction is improved.

Description

Pedestrian trajectory prediction method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a pedestrian trajectory prediction method, a pedestrian trajectory prediction device, pedestrian trajectory prediction equipment and a storage medium.
Background
Predicting the pedestrian's trajectory is a very critical issue because it has wide application in intelligent systems, ranging from traffic systems to surveillance systems, such as autopilots, autonomous robots, and surveillance systems, which are not isolated from predicting the pedestrian's trajectory. Predicting pedestrian movement is critical to autonomous vehicle collision avoidance and accident reduction. In urban areas, the monitoring system detects and tracks the activity of any suspicious pedestrian based on accurate predictions of video trajectories. However, the trajectory of a pedestrian is a multi-modal process that is influenced by physical interactions with the physical scene, as well as complex social interactions with pedestrians or other road users. In the related art, physical interaction is extracted through a 'context-aware' deep learning method so as to predict the trajectory of the pedestrian, for example, a convolutional network is adopted, but in these methods, the pedestrian adjusts their path only according to the adjacent interaction, and does not consider the existence of other people in the scene, that is, the social interaction type is not considered in the prediction, and different social interaction modes, such as following or avoiding collision, can generate different subsequent movement trends. In the social interaction modeling method in the related art, the interaction between pedestrians is represented by the strength of social interaction, generally based on a function and a rule which are drawn up in advance, and cannot be well expanded to a more complex interaction scene. And most of the predicted trajectories of the related art are monomodal trajectories and are not probability distribution information of the trajectories.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the application provides a pedestrian trajectory prediction method, a pedestrian trajectory prediction device, equipment and a storage medium, wherein the pedestrian trajectory prediction method, the pedestrian trajectory prediction device, the equipment and the storage medium are used for predicting by utilizing an interaction type to obtain a pedestrian trajectory prediction probability distribution result, and the accuracy rate of the pedestrian trajectory prediction is improved.
In a first aspect, an embodiment of the present application provides a method for predicting a pedestrian trajectory, including:
acquiring pedestrian historical track data;
inputting the pedestrian historical track data into a pre-trained feature vector coding model to obtain a pedestrian track feature vector;
inputting the pedestrian track characteristic vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information;
and inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result.
In an optional implementation manner, the feature vector coding model includes a self-coding module, a social coding module, and a feature vector time module, and the inputting the pedestrian history trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector includes:
inputting the pedestrian historical track data into the pre-trained self-coding module to obtain self-coding feature vectors;
inputting the pedestrian historical track data into the pre-trained social coding module and the feature vector time module to obtain a social coding feature vector;
and inputting the self-coding feature vector and the social coding feature vector into the feature vector time module to obtain the pedestrian trajectory feature vector.
In an optional implementation manner, the interaction type estimation model includes: the method comprises a social interaction classification model and a latent variable estimation model, wherein the pedestrian trajectory feature vector is input into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and the method comprises the following steps:
inputting the pedestrian track characteristic vector into the social interaction classification model to obtain an interaction type distribution result;
and inputting the interaction type distribution result into the latent variable estimation model to obtain the interaction type latent variable information.
In an optional implementation manner, the social interaction classification model is a sensor classification model, and the inputting the pedestrian trajectory feature vector into the social interaction classification model to obtain an interaction type distribution result includes:
inputting the pedestrian trajectory feature vector into the sensor classification model to obtain an interaction type distribution result;
and carrying out random gradient estimation on the interaction type distribution result to obtain the interaction type distribution result.
In an optional implementation manner, before the acquiring pedestrian historical trajectory data, the method further includes:
obtaining training data, the training data comprising: pedestrian historical trajectory training data and pedestrian future trajectory training data;
inputting the pedestrian historical track training data and the pedestrian future track training data into the feature vector coding model to obtain a pedestrian historical track feature vector and a pedestrian future track feature vector;
inputting the pedestrian historical track feature vector into the social interaction classification model to obtain a prior interaction type distribution result;
inputting the pedestrian future trajectory feature vector into the social interaction classification model to obtain a posterior interaction type distribution result;
and inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model, and minimizing a latent variable loss function to obtain the trained latent variable estimation model.
In an optional implementation manner, the method further includes:
and according to the prior interaction type distribution result and the posterior interaction type distribution result, minimizing a social interaction classification loss function to obtain the trained social interaction classification model.
In an optional implementation manner, the inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model to minimize a latent variable loss function to obtain the trained latent variable estimation model includes:
inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model, and obtaining prior interaction type latent variable information and posterior interaction type latent variable information by using a reparameterization mode;
calculating a latent variable loss function according to the prior interaction type latent variable information and the posterior interaction type latent variable information;
and minimizing the latent variable loss function to obtain the trained latent variable estimation model.
In a second aspect, an embodiment of the present application provides a pedestrian trajectory prediction device, including:
the pedestrian historical track data acquisition module is used for acquiring pedestrian historical track data;
the pedestrian track characteristic vector acquisition module is used for inputting the pedestrian historical track data into a pre-trained characteristic vector coding model to obtain a pedestrian track characteristic vector;
the interaction type latent variable information acquisition module is used for inputting the pedestrian track characteristic vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information;
and the pedestrian track prediction module is used for inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result.
In a third aspect, a computer device includes a processor and a memory;
the memory is used for storing programs;
the processor is configured to execute the pedestrian trajectory prediction method according to any one of the first aspect according to the program.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions for executing the pedestrian trajectory prediction method according to any one of the first aspect.
Compared with the related art, the pedestrian trajectory prediction method provided by the first aspect of the embodiment of the application obtains pedestrian historical trajectory data, inputs the pedestrian historical trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector, inputs the pedestrian trajectory feature vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and inputs the interaction type latent variable information and the pedestrian trajectory feature vector into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result. In the embodiment, the interaction type latent variable information is obtained according to the pedestrian track characteristic vector, and the social interaction type among pedestrians is utilized, so that the problem of no difference interaction among pedestrians in the related technology is avoided, and the track prediction accuracy is improved. And a pedestrian trajectory prediction probability distribution result is obtained by utilizing a multi-modal behavior prediction model, namely, probability distribution of feasible trajectories of pedestrians is generated instead of a single trajectory, so that reasonable trajectories of multiple pedestrians can be obtained through prediction, and the universality of trajectory prediction is improved.
It is to be understood that the advantageous effects of the second aspect to the fourth aspect compared to the related art are the same as the advantageous effects of the first aspect compared to the related art, and reference may be made to the related description of the first aspect, which is not repeated herein.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the related technical descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic diagram of an exemplary system architecture provided by one embodiment of the present application;
FIG. 2 is a flow chart of a method for predicting a pedestrian trajectory according to an embodiment of the present application;
FIG. 3 is a diagram illustrating a structure of a feature vector coding model according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a behavior prediction model provided in an embodiment of the present application;
FIG. 5 is a flowchart of a method for predicting a pedestrian trajectory according to an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating a training interaction type estimation model according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating a signal flow in a process of training an interactive type estimation model according to an embodiment of the present application;
FIG. 8 is a graphical illustration of predicted results provided by one embodiment of the present application;
FIG. 9 is a further illustration of the predicted results provided by one embodiment of the present application;
fig. 10 is a block diagram of a pedestrian trajectory prediction apparatus according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the embodiments of the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the embodiments of the present application with unnecessary detail.
It should be noted that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different from that in the flowcharts. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
It should also be appreciated that reference throughout the specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Predicting the pedestrian's trajectory is a very critical issue because it has wide application in intelligent systems, ranging from traffic systems to surveillance systems, such as autopilots, autonomous robots, and surveillance systems, which are not isolated from predicting the pedestrian's trajectory. Predicting pedestrian movement is critical to autonomous vehicle collision avoidance and accident reduction. In urban areas, the monitoring system detects and tracks the activity of any suspicious pedestrian based on accurate predictions of video trajectories. However, the trajectory of a pedestrian is a multi-modal process that is influenced by physical interactions with the physical scene, as well as complex social interactions with pedestrians or other road users. Among them, physical scene interaction is a common phenomenon in the open world. The physical scene includes various static structures such as buildings, impassable roads and trees that may impede or affect pedestrian movement. Social interactions are more complex than physical scene interactions due to their dynamics, diversity and privacy. For example, there are a number of modes of social interaction of a typical individual with others, including follow-leader, collision avoidance, group avoidance, and so forth.
In order to learn the physical scene interaction in the related art, the physical interaction is extracted through a deep learning method of "context awareness", and a physical scene understanding task is indirectly solved by adopting a scheme of generating annotations from scenic images, so as to predict pedestrian trajectories, for example, a convolutional network is adopted, but in the methods, pedestrians adjust their paths only according to adjacent interaction, and do not consider the existence of other people in the scene, that is, the social interaction types are not considered in the prediction, and different social interaction modes, such as following or avoiding collision, can generate different subsequent movement trends. In the social interaction modeling method in the related art, the interaction between pedestrians is represented by the strength of social interaction, generally based on a function and a rule which are drawn up in advance, and cannot be well expanded to a more complex interaction scene. Or a data-driven social interaction model is adopted, which is mainly based on a constant symmetric function, for example, the degree of interaction between pedestrians is expressed by Euclidean distance, and the method cannot accurately express the interaction type between pedestrians. And most of the predicted trajectories of the related art are monomodal trajectories and are not probability distribution information of the trajectories.
Therefore, the embodiment of the application provides a pedestrian trajectory prediction method, which includes the steps of obtaining pedestrian historical trajectory data, inputting the pedestrian historical trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector, inputting the pedestrian trajectory feature vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and inputting the interaction type latent variable information and the pedestrian trajectory feature vector into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result. In the embodiment, the interaction type latent variable information is obtained according to the pedestrian track characteristic vector, and the social interaction type among pedestrians is utilized, so that the problem of no difference interaction among pedestrians in the related technology is avoided, and the track prediction accuracy is improved. And a pedestrian trajectory prediction probability distribution result is obtained by utilizing a multi-modal behavior prediction model, namely, probability distribution of feasible trajectories of pedestrians is generated instead of a single trajectory, so that reasonable trajectories of multiple pedestrians can be obtained through prediction, and the universality of trajectory prediction is improved.
It is understood that the pedestrian trajectory prediction method provided in the embodiments of the present application may be implemented by various electronic devices with computing processing capability, for example, various user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), and the like, and may also be implemented by a server.
It should be noted that the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, and a big data and artificial intelligence platform, which is not limited herein.
In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, taking an example that the pedestrian trajectory prediction method provided in the embodiments of the present application is applied to a server, an application scenario to which the pedestrian trajectory prediction method provided in the embodiments of the present application is applied is introduced below.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a pedestrian trajectory prediction method provided in an embodiment of the present application.
As shown in fig. 1, system architecture 100 may include a database 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between database 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.
In an embodiment of the invention, the server 103 obtains pedestrian historical trajectory data from the database 101, inputs the pedestrian historical trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector, inputs the pedestrian trajectory feature vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and inputs the interaction type latent variable information and the pedestrian trajectory feature vector into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result, thereby improving the trajectory prediction accuracy and universality.
It should be noted that the pedestrian trajectory prediction method provided by the embodiment of the present invention is generally executed by the server 103, and accordingly, the pedestrian trajectory prediction device is generally disposed in the server 103. However, in other embodiments of the present invention, the terminal device may also have a similar function as the server, so as to execute the pedestrian trajectory prediction scheme provided by the embodiment of the present invention.
The system architecture and the application scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and it is known by those skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems with the evolution of the system architecture and the appearance of new application scenarios. Those skilled in the art will appreciate that the system architecture shown in FIG. 1 is not intended to be limiting of embodiments of the present application and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
Based on the system architecture, various embodiments of the pedestrian trajectory prediction method in the embodiments of the present application are provided.
As shown in fig. 2, fig. 2 is a flowchart of a pedestrian trajectory prediction method according to an embodiment of the present application, including but not limited to step S110 and step S140.
In step S110, pedestrian history trajectory data is acquired.
In one embodiment, the pedestrian trajectory prediction is based on the T observed in the past for the pedestrianobsThe movement information of the moment, and the future T of the pedestrian is predictedpreMotion information of a moment.
In this embodiment, the pedestrian history trajectory data (belonging to the prior data) of N pedestrians is expressed as: x ═ X1,…,xNAnd (4) the track of each pedestrian is a combined state sequence of position coordinates on an x axis and a y axis, wherein the pedestrian historical track data of the ith pedestrian is represented as:
Figure BDA0003346567680000051
wherein the content of the first and second substances,
Figure BDA0003346567680000052
a coordinate value on the x-axis representing the ith pedestrian at time t,
Figure BDA0003346567680000053
coordinate value, R, on the y-axis at time t of the ith pedestrian2Representing a real two-dimensional space, TobsRepresents the total acquisition TobsHistorical track data of the time of day.
In this embodiment, the future trajectory data of the N pedestrians is represented as: y ═ Y1,…,yNEach of which isThe trajectories of pedestrians are all a combined state sequence of position coordinates on the x-axis and the y-axis, wherein the future trajectory data of the ith pedestrian is expressed as:
Figure BDA0003346567680000061
wherein the content of the first and second substances,
Figure BDA0003346567680000062
a coordinate value on the x-axis representing the ith pedestrian at time t,
Figure BDA0003346567680000063
coordinate value, R, on the y-axis at time t of the ith pedestrian2Representing a real two-dimensional space, TpreRepresents the total predicted TpreFuture trajectory data of the time of day.
And step S120, inputting the pedestrian historical track data into a pre-trained feature vector coding model to obtain a pedestrian track feature vector.
In one embodiment, the feature vector encoding model includes a self-encoding module, a social encoding module, and a feature vector time module.
The pedestrian history track data includes: pedestrian historical speed data v, pedestrian historical relative distance data d and pedestrian historical relative direction data r, i.e. the historical trajectory of the pedestrian can be characterized in terms of speed, distance and direction. For example, the above three data of N pedestrians can be calculated in the GPS information, and are expressed as:
v={vi∈R2|i=1,2,...N}
d={di,j∈R2|i,j∈{1,2,...N}}
r={ri,j∈R2|i,j∈{1,2,...N}}
di,j=xj–xi
ri,j=vj-vi
wherein v isiRepresenting the pedestrian's historical speed data of the ith pedestrian, di,jTo representPedestrian history relative distance data, r, of the ith pedestrian and the jth pedestriani,jAnd pedestrian history relative direction data representing the ith pedestrian and the jth pedestrian.
In an embodiment, it is understood that, before the pedestrian trajectory prediction is performed, the feature vector coding model is pre-trained, that is, the self-coding module, the social coding module and the feature vector time module are pre-trained, and the step S120 includes the following steps:
and step S121, inputting the pedestrian historical track data into a pre-trained self-coding module to obtain self-coding feature vectors.
And S122, inputting the pedestrian historical track data into a pre-trained social coding module and a feature vector time module to obtain a social coding feature vector.
And S123, inputting the self-coding feature vector and the social coding feature vector into a feature vector time module to obtain a pedestrian track feature vector.
In an embodiment, referring to fig. 3, a structural diagram of a feature vector coding model in an embodiment of the present application is shown, where (1) denotes a self-coding module, (2) denotes a social coding module, and (3) denotes a feature vector time module, it can be understood that the above-described partitioning method is only an example because autoregressive exists in the feature vector coding model, and all the functions that can implement the feature vector coding model belong to the scope of the present embodiment, and are not specifically limited herein.
Referring to fig. 3, the self-encoding module includes a fully connected layers (FC) through which pedestrian historical speed data in input pedestrian historical trajectory data is converted into high-dimensional self-encoded feature vectors, represented as:
hself=FC(v)
where v represents pedestrian historical speed data, FC (-) represents a fully connected layer, hselfRepresenting the self-encoding feature vector and h represents a hidden state.
The social coding module comprises at least one full connection layer and a multi-layer attention network, wherein the multi-layer attention network is composed of m self-attention networks, and the input of the multi-layer attention network comprises: a Query vector Query (denoted as Q), a Key Value vector Key (denoted as K), and a Value vector Value (denoted as V).
The feature vector time module can adopt a Long Short-Term Memory network (LSTM), the Long Short-Term Memory network is a special Recurrent Neural Network (RNN) and can solve the problems of gradient disappearance and gradient explosion in the Long sequence training process, namely, compared with the common RNN, the LSTM network can have better performance in a longer sequence, the transmission state is controlled through a gating state, and unimportant information which needs to be memorized for a Long time is memorized and forgotten.
Firstly, inputting the pedestrian history relative distance data d and the pedestrian history relative direction data r in the pedestrian history track data into corresponding full-connection layers respectively to obtain corresponding outputs, wherein the corresponding outputs are expressed as follows:
hdist=FC(d)
hdir=FC(r)
wherein h isdistOutput representing historical relative distance data of pedestrians, hdirRepresenting the output of pedestrian historical relative directional data. Secondly, calculating a query vector Q, a key value vector K and a value vector V of the multi-layer attention network, and expressing as:
Q=FC(concat(hdist,hdir,FC(hlstm)))
K=FC(concat(hdist,hdir,FC(hlstm)))
V=FC(concat(hdist,hdir,FC(hlstm)))
where concat (. cndot.) represents the concatenation of different inputs, hlstmThe output of the feature vector time module is shown, in conjunction with FIG. 3, as input into the multi-layered attention network, FC (h), via an autoregressive mechanismlstm) Representing the output obtained through the corresponding fully connected layer.
The output social coding feature vector of the social coding module is represented as:
hsocial=concat(head,…,head)W
Figure BDA0003346567680000071
wherein h issocialA feature vector of the social coding is represented,
Figure BDA0003346567680000072
and
Figure BDA0003346567680000073
respectively representing weighting coefficients of the nth self-attention network corresponding to a query vector Q, a key value vector K and a value vector V, W represents an output weighting coefficient matrix, dkRepresenting a scaling factor.
Finally, the self-coding feature vector h obtained by the method is usedselfAnd social coding feature vector hsocialInputting the data into a feature vector time module (such as an LSTM network), modeling the track features at different times to obtain a pedestrian track feature vector hencExpressed as:
henc=LSTM(concat(hself,hsocial))
wherein h isencRepresenting a pedestrian trajectory feature vector.
And step S130, inputting the pedestrian track characteristic vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information.
It will be appreciated that the different fully-connected layers described above are all represented by FC, but each connected layer has a corresponding weight parameter.
In one embodiment, the interaction type estimation model includes: a social interaction classification model and a latent variable estimation model, in this embodiment, step S130 includes, but is not limited to, step S131 to step S132:
and S131, inputting the pedestrian track characteristic vector into a social interaction classification model to obtain an interaction type distribution result.
In one embodiment, the social interaction classification model may be a perceptron classification model, such as a Multi-layer perceptron model MLP (Multi-layer perceptron), which is an artificial neural network of a forward structure that maps a set of input vectors to a set of output vectors, and which may be viewed as a directed graph consisting of a plurality of node layers, each layer being fully connected to the next layer. Except for the input nodes, each node is a neuron with a nonlinear activation function. The MLP can be trained using supervised learning methods of the BP back-propagation algorithm. MLP overcomes the weakness of perceptrons that do not recognize linear irreducible data. Compared with a single-layer sensor, the output end of the MLP multilayer sensor is changed from one to a plurality of, and not only one layer is arranged between the input end and the output end, but also two layers are arranged at present: an output layer and a hidden layer. The method is based on a typical feed-forward network based on back propagation learning, all layers of the feed-forward network are connected, information processing is carried out layer by layer from an input layer to each hidden layer and then to an output layer, the hidden layers realize nonlinear mapping on an input space, and the output layer realizes linear classification of results.
In an embodiment, the social interaction classification model is a Three-layer perceptron classification model TLP (Three-layer perceptron), that is, an MLP model including only one hidden layer, a parameter of the TLP model is represented by θ, and the social interaction classification model is used for modeling distribution of interaction types, and is an unsupervised multi-classification manner.
In one embodiment, step S131 is not limited to step S1311 to step S1312:
step S1311, inputting the pedestrian trajectory feature vectors to a sensor classification model to obtain a priori interaction type distribution result.
In one embodiment, since the social interaction pattern between pedestrians is asymmetric, the above-mentioned pedestrian trajectory feature vector of the ith pedestrian needs to be determined
Figure BDA0003346567680000081
And the above-mentioned pedestrian trajectory feature vector of the jth pedestrian
Figure BDA0003346567680000082
Ordered concatenation was performed, expressed as:
Figure BDA0003346567680000083
wherein p represents the result of the serial connection of the pedestrian track characteristic vectors and satisfies the condition that p belongs to RN×(N-1)×2EN denotes N pedestrians, and E denotes a dimension of a pedestrian trajectory feature vector.
Inputting the pedestrian track characteristic vectors after the series connection into a sensor classification model to obtain an interaction type distribution result, wherein the interaction type distribution result is expressed as:
hpair=TLP(p)
wherein h ispairRepresenting prior interaction types, setting the output layer dimension of the sensor classification model as the possible interaction type number H, where the number may be set according to actual needs, for example, the social interaction types may be divided into four main types (static, linear, interactive, and non-interactive), or may be further divided into four subcategories (leader-follower, collision avoidance, group, and other interactions), and the like, and are not specifically limited herein.
Step S1312, performing random gradient estimation on the prior interaction types to obtain interaction type distribution results.
In one embodiment, since the output of the perceptron classification model is represented in terms of a probability distribution, the social interaction type is represented as C, represented as: c ═ CijIn which c isijRepresenting the type of social interaction between pedestrian i and pedestrian j, and hence by conditional prior network pθ(c | x) represents the type of social interaction that corresponds to the input x. Due to prior interaction type hpairIn the embodiment, sampling is performed based on the gumbel-softmax to realize random gradient estimation, the obtained gradient value can be calculated in a reparameterization mode, and the gradient value is used for approximating a distribution sample in the social interaction pattern distribution and is represented as:
pθ(c|x)=softmax((hpair+g)/τ)
wherein p isθ(c | x) represents the interaction type distribution result, and is based on the pedestrian history trackThe data x calculates the probability distribution of the pedestrian social interaction pattern c, g represents a value sampled from Gumbel (0,1), and τ represents a hyper-parameter.
In an embodiment, when the hyper-parameter τ is smaller, the variance of the calculated random gradient is larger and approaches to the one-hot type, and when the hyper-parameter τ is larger, the generated random gradient is more uniform and the variance is smaller, so that the corresponding hyper-parameter can be selected according to actual requirements.
And step S132, inputting the interaction type distribution result into the latent variable estimation model to obtain interaction type latent variable information.
In one embodiment, in order to realize multi-modal behavior prediction, a pedestrian trajectory prediction probability distribution result is obtained, that is, a probability distribution of feasible trajectories of pedestrians is generated instead of a single trajectory, so that reasonable trajectories of multiple pedestrians can be obtained through prediction, the universality of trajectory prediction is improved, and a latent variable z, namely interaction type latent variable information, is introduced, and can be used as noise of a certain predicted trajectory.
In one embodiment, the latent variable estimation model may be a CVAE (conditional automatic Encoder) generation model, wherein the CVAE generation model is a conditional self-Encoder, and mainly comprises two parts, an Encoder (Encoder) and a Decoder (Decoder), for the Encoder, real data is input, and a mean and a variance of an input image are obtained through calculation and are represented by an explicit gaussian distribution, and a random vector, i.e., the latent variable z, is obtained by sampling from the gaussian distribution, and then the input of the Decoder is the random vector output in the Encoder, so as to reconstruct the real data.
In one embodiment, the sample of the latent variable z conforms to an isotropic Gaussian distribution, and the above interaction type distribution result p of the latent variable z is introducedθ(c | x) is represented by pθ(z | x, c) whose distribution is in accordance with:
pθ(z|x,c)~N(μ,σ2I)
wherein N represents a Gaussian distribution, μ represents a mean value, σ2Represents variance, and I represents identity matrix.
In one embodiment, the mean and variance are calculated as:
μ=FC(henc)
logσ2=FC(henc)
where FC (·) denotes a full connectivity layer, and the network parameters of both layers are different.
In one embodiment, p is parameterized from aboveθThe sample of the latent variable z is obtained in (z | x, c) and is expressed as:
ε~N(0,I)
z=μ+εσ
wherein z represents interaction type latent variable information.
And step S140, inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result.
In one embodiment, the above-mentioned step of obtaining the interaction type latent variable information z and the pedestrian trajectory feature vector hencAnd then inputting the two into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result.
Referring to fig. 4, which is a schematic diagram of a behavior prediction model structure in an embodiment of the present application, the behavior prediction model structure is similar to a feature vector coding model structure, and is also a generation network, first, the interaction type latent variable information z and the pedestrian trajectory feature vector h are obtained as described aboveencConcatenated, as the initial hidden layer state of the LSTM network, expressed as: h islstm=henc≧ z, the behavior prediction model outputs a trajectory one step at a time, and then the currently generated output is fed back as input through an autoregressive form to generate trajectory information for the next time step, through which compound errors in the prediction can be reduced.
In an embodiment, before acquiring the pedestrian historical trajectory data, the method further includes training an interaction type estimation model in advance, including training a social interaction classification model and a latent variable estimation model, and referring to fig. 5, the training of the latent variable estimation model includes, but is not limited to, steps S510 to S550:
step S510, acquiring training data, where the training data includes: pedestrian historical trajectory training data and pedestrian future trajectory training data.
In one embodiment, since the social interaction type of the pedestrian is determined not only by the historical track but also by the future motion track, the training data X of the historical track of the pedestrian can be obtained simultaneously during the training processtrain={x1,…,xNAnd pedestrian future trajectory training data Ytrain={y1,…,yN}。
Step S520, inputting the pedestrian historical track training data and the pedestrian future track training data into the feature vector coding model to obtain the pedestrian historical track feature vector and the pedestrian future track feature vector.
In one embodiment, referring to fig. 6, a schematic flow chart of the method for training the interactive type estimation model is shown. Training the historical track of the pedestrian by using the data Xtrain={x1,…,xNAnd pedestrian future trajectory training data Ytrain={y1,…,yNAnd inputting the vectors into a feature vector coding model respectively to obtain a pedestrian historical track feature vector and a pedestrian future track feature vector. Referring to FIG. 4, using y in the trainingtAnd x is adopted in predictiont
And step S530, inputting the pedestrian historical track feature vector into a social interaction classification model to obtain a prior interaction type distribution result.
And S540, inputting the pedestrian future trajectory feature vector into a social interaction classification model to obtain a posterior interaction type distribution result.
In an embodiment, referring to fig. 6, after the obtained pedestrian historical track feature vector and pedestrian future track feature vector are respectively input into a social interaction classification model (i.e., a three-layer perceptron classification model TLP in the figure), respectively performing a gum-softmax sampling to obtain a priori interaction type distribution result p(c | x) whose parameters are represented by θ, and a posterior interaction type distribution result qφ(c' | x, y), the parameter of which is expressed by phi, and the posterior interaction type distribution result takes the pedestrian historical track and the pedestrian future track as conditions and outputs a pairType of social interaction due c'.
And step S550, inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model, and minimizing the latent variable loss function to obtain the trained latent variable estimation model.
In one embodiment, the result p of the prior interaction type distribution of the latent variable z is introduced because the sample of the latent variable z conforms to an isotropic Gaussian distribution(c | x) is represented by p(z | x, c) whose distribution is in accordance with:
p(z|x,c)~N(μ11 2I)
wherein N represents a Gaussian distribution,. mu.1Denotes the mean value, σ1 2Represents variance, and I represents identity matrix.
The above-mentioned posterior interaction type distribution result q introducing latent variable zφ(c' | x, y) is represented by qφ(z 'x, y, c') whose distribution corresponds to:
Figure BDA0003346567680000101
wherein N represents a Gaussian distribution,. mu.2The mean value is represented by the average value,
Figure BDA0003346567680000102
represents variance, and I represents identity matrix.
In one embodiment, referring to fig. 6, a re-parameterization method is respectively used to obtain samples of the latent variable z (prior interaction type latent variable information and a posteriori interaction type latent variable information). For example, using the full connectivity layer FC (-) and the RELU activation function in fig. 6 yields: the prior interaction type distribution result and the posterior interaction type distribution result correspond to (mu)θ~logσθ 2) And (mu)φ~logσφ 2) So as to obtain corresponding prior interaction type latent variable information and posterior interaction type latent variable information, which are expressed as:
ε~N(0,I)
zx=μθ+εσθ
zy=μφ+εσφ
wherein z isxRepresenting a priori interaction type latent variable information, zyAnd representing the posterior interaction type latent variable information.
In one embodiment, in the training process, according to the prior interaction type distribution result and the posterior interaction type distribution result, a social interaction classification loss function is minimized to obtain a trained social interaction classification model, then a latent variable loss function is calculated according to the prior interaction type latent variable information and the posterior interaction type latent variable information, a target function is the minimized latent variable loss function, a trained latent variable estimation model is obtained, and accordingly the trained interaction type estimation model is obtained.
In an embodiment, the social interaction classification loss function is a relative entropy between the prior interaction type distribution result and the posterior interaction type distribution result, that is, a Kullback-Leibler distance, where the Kullback-Leibler distance is used to measure a distance between two probability distributions P and Q (that is, the prior interaction type distribution result and the posterior interaction type distribution result in the embodiment) in the same probability space, and the objective function is a minimum Kullback-Leibler distance, and is expressed as:
D[c'||c]=D[qφ(c'|x,y)||p(c|x)]
in one embodiment, the latent variable loss function is the relative entropy between the prior interaction type latent variable information and the a posteriori interaction type latent variable information, i.e. the Kullback-Leibler distance, and the objective function is the minimum Kullback-Leibler distance, expressed as:
Figure BDA0003346567680000113
in this embodiment, minimizing the Kullback-Leibler distance is achieved by a variational reasoning method, and to achieve the variational reasoning, an Evidence Lower Bound (ELBO) needs to be obtained, where the Evidence refers to probability density. Distributing results q from a posteriori interaction types during training when social interaction type c' isφ(c'|x,y)In the middle sampling, the estimated q is obtained by the variation reasoningφ(z '| x, y, c') and pThe Kullback-Leibler distance between (z | x, c) is minimal and is expressed as:
D[qφ(z'|x,y,c')||p(z|x,c)]=E[logqφ(z'|x,y,c')-logp(z x,c)]
the lower evidence bound is obtained by the expansion calculation of Bayes rule when c' qφ(c' | x, y) and c to p(c | x), the lower bound of evidence is derived as:
logp(y|x,c′)≥E[logp(y|x,z,c′)]-D[qφ(z|x,y,c′)||p(z|x,c)]-D[p(z|x,c)||p(z|x,c′)]
wherein, E [ logp ](y|x,z,c′)]Representing the log-likelihood logpMathematical expectation of (y | x, z, c').
According to the formula, the social interaction classification loss function can be minimized to obtain a trained social interaction classification model, then the latent variable loss function is minimized to obtain a trained latent variable estimation model, and therefore the trained interaction type estimation model is obtained.
In an embodiment, the training process further comprises: inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a behavior prediction model to obtain a pedestrian track prediction probability distribution result (expressed as
Figure BDA0003346567680000115
) Comparing the predicted pedestrian track prediction probability distribution result with an input label value (actually-referenced pedestrian future track data, represented as Y), minimizing a loss function L between the two, taking the loss function L as a training target, training a behavior prediction model, and adjusting a corresponding model weight so as to improve the prediction accuracy.
In one embodiment, the lower bound of evidence that is negative for the loss function L is expressed as:
Figure BDA0003346567680000111
wherein λ iszAnd λcRespectively representing the hyper-parameters for the trade-off loss function, since p(z | x, c) and qφ(z '| x, y, c') all obey a Gaussian distribution: p is a radical of(z|x,c)~N(μ11 2I),
Figure BDA0003346567680000114
Therefore, there are:
Figure BDA0003346567680000112
D[qφ(c′|x,y)||p(c|x)]=En(qφ(c′|x,y),p(c|x))-En(qφ(c′|x,y),qφ(c′|x,y))
where En (q, p) represents the cross entropy of p and q, and En (q, p) — q log (p).
In one embodiment, referring to FIG. 7, a signal flow diagram during model estimation for training interaction types is shown. Each node in the graph represents an instance of a mathematical operation or network function module and each edge represents multidimensional data that performs the operation.
As shown in FIG. 7, in the training process, the pedestrian historical track training data X is firstly usedtrainTraining data Y of future track of pedestriantrainRespectively inputting the social interaction classification model to obtain a prior interaction type distribution result p(c | x) and a posteriori interaction type distribution result qφ(c' | x, y), then obtaining a social interaction classification loss function through social interaction classification model gum-softmax sampling calculation, and training the social interaction classification model, wherein the social interaction classification loss function in the embodiment is represented as: d [ c' | c]。
Then, the latent variable is estimated and introduced through an input latent variable estimation model to obtain p(z | x, c) and qφ(z '| x, y, c') carrying out reparameterization to obtain prior interaction type latent variable information zxLatent variable information of and a posteriori interaction typeZ of informationyLatent variable information z according to a priori interaction typexAnd a posteriori interaction type latent variable information zyCalculating a latent variable loss function, and training a latent variable estimation model, wherein the latent variable loss function in the embodiment is expressed as:
Figure BDA0003346567680000121
the embodiment of the application provides a pedestrian trajectory prediction method, which comprises the steps of obtaining pedestrian historical trajectory data, inputting the pedestrian historical trajectory data into a pre-trained feature vector coding model to obtain pedestrian trajectory feature vectors, inputting the pedestrian trajectory feature vectors into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and inputting the interaction type latent variable information and the pedestrian trajectory feature vectors into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result. In the embodiment, the interaction type latent variable information is obtained according to the pedestrian track characteristic vector, and the social interaction type among pedestrians is utilized, so that the problem of no difference interaction among pedestrians in the related technology is avoided, and the track prediction accuracy is improved. And a pedestrian trajectory prediction probability distribution result is obtained by utilizing a multi-modal behavior prediction model, namely, probability distribution of feasible trajectories of pedestrians is generated instead of a single trajectory, so that reasonable trajectories of multiple pedestrians can be obtained through prediction, and the universality of trajectory prediction is improved.
In one embodiment, to embody the prediction results of the pedestrian trajectory prediction method of the present application, analysis was performed on two mature data sets: ETH and UCY, where the ETH data set originates from ETH Zurich (federal institute of technology, Zurich) where 12298 samples of pedestrians were recorded, the UCY data set is a standard test data set, both of which contain real-world trajectories of multiple pedestrians, including a large number of social interactions. The ETH data set contains two scenarios ETH and HOTEL. UCY the data set contains three scenes ZARA1, ZARA2, and UNIV. These five data sets are the main benchmarks for pedestrian trajectory prediction. Following the previous dataset segmentation, the model is trained on a portion of the particular dataset and tested on the remaining portion. The data sets of the other four scenarios are used for verification. All traces in the dataset were sampled every 0.4 seconds. The evaluation occurred within 8 seconds (20 time segments), with the first 3.2 seconds (8 time segments) set as the pedestrian history trajectory and the model trained to predict the next 4.8 seconds (12 time segments).
Next, in the data set, the pedestrian trajectory prediction method (the corresponding model is denoted as Social-dual cvae) provided in the embodiment of the present application is compared with the existing model in the related art, where the prediction model in the related art includes:
1) a non-probabilistic method, Linear, which is a Linear regressor that outputs a predicted trajectory by minimizing the least square error;
2) four prediction models, respectively:
Social-LSTM: an LSTM-based network has a social set of hidden states.
Social-STGCNN: the interaction was modeled as a graph and the sequence was predicted by TCN.
PIF: visual information of human behavioral sequences is utilized.
And (3) GAT: a graph-based network that employs graph attention networks to capture social interactions.
3) Four generative models, respectively:
Social-GAN: GAN was applied to Social-LSTM.
SoPhie: attention networks were introduced to the Social-GAN.
CGNS: variant divergence minimization is applied to the potential spatial learning.
Social-BiGAT: a potential scene encoder is appended to the GAT.
The present embodiment uses Average Displacement Error (ADE) and Final Displacement Error (FDE) to evaluate the performance of the corresponding models of different prediction methods, ADE being defined as the minimum average displacement error along the predicted future trajectory and FDE being defined as the final displacement error of the endpoint, while employing a hold-and-reject cross-evaluation strategy, and the minimum of ADE and FDE, ADE20 and FDE20 are retained as quantitative results by randomly sampling from N (0, I) to generate K-20 samples.
Figure BDA0003346567680000131
Figure BDA0003346567680000132
Wherein the number of the N pedestrians is,
Figure BDA0003346567680000133
indicating the predicted trajectory prediction result at the time t of the nth pedestrian,
Figure BDA0003346567680000134
pedestrian future trajectory data, T, representing the actual reference time T of the nth pedestrianpreIndicating the predicted duration.
Referring to table 1 below, the Average Displacement Error (ADE) and Final Displacement Error (FDE) values in the different prediction models described above are shown.
Figure BDA0003346567680000135
In table 1 above, the Social-DualCVAE of the present example is compared to other predictive models in the related art, and the ADE and FDE of pedestrian movement are reported for 12 time periods. It can be seen that both predictive and generative models perform better than simple linear models. In particular, Social-BiGAT improves the performance of GAT by inducing latent variables, as evaluated from 20 samples, while Social-STGCNN further exceeds Social-BiGAT by modeling the scene using a space-time diagram. With respect to the model proposed in the present example, the CVAE model alone does improve performance compared to the baseline generation model because the VAE structure has a good probability expression, which is the result of maximizing the lower bound of log-likelihood, but the Social-DualCVAE model in the present example introduces an unsupervised classification network based on the CVAE model, Social-DualCVAE, which achieves the best performance among all predictive and generative models shown in table 1.
In addition, regarding the hyper-parameter H of the Social-DualCVAE model with different types of Social interactions, it can be seen that the ADE (H ═ 4) of Social-DualCVAE achieves the best performance in all compared models, resulting in a 14% reduction in ADE and an 8% reduction in FDE over the most advanced model Social-STGCNN. While the Social-DualCVAE architecture does not significantly contribute to performance when H2 and H6. The performance degradation indicates that selecting the hyperparameter H of the inappropriate social interaction category can lead to the inappropriate classification result and mislead the prediction performance.
In this embodiment, during training, the size of the training batch is set to 128, 60 epochs are trained by the model corresponding to the pedestrian trajectory prediction method by using the adam optimizer, the initial learning rate is 0.001, and after 30 epochs, the initial learning rate is changed to 0.0001. To avoid overfitting, L2 regularization was employed, with the regularization parameter set to 0.1. Furthermore, the dropout probability of all dropout layers is set to 0.2, and the number of interaction type superparameters H is set to 2, 4, and 6. The hyperparameter τ is set to 0.1. Weight over-parameter λ in the loss functionzAnd λcAre set to 0.005.
Fig. 8 is a schematic diagram of the prediction result in this embodiment. The first five panels in fig. 8 are prediction results of Social-STGCNN in the related art, and the last five panels in fig. 8 are prediction results of pedestrian trajectories according to the embodiment of the present application. The historical trajectories of the pedestrians are shown by dashed lines, the true reference predicted trajectories are shown by solid lines, and the resulting multi-modal samples are shown as contour plots. In five scene frames of ETH and UCY data sets, the pedestrian trajectory prediction method of the embodiment of the application is compared with the prediction result obtained by Social-STGCNN in the related art. The predicted trajectories for 5 scenes are shown in the figure: scene1 (scene 1), scene2 (scene 2), scene3 (scene 3), scene4 (scene 4), and scene5 (scene 5), which include social interactions that are common in the real world, such as walking in parallel or in groups (e.g., pedestrian 1, 2, 3 in scene 1), following or leading (e.g., pedestrian 1, 2 in scene 2), meeting or avoiding a collision (e.g., pedestrian 3, 4 in scene 2), and no interaction (e.g., pedestrian 1, 2 in scene 5).
The Social interaction is better captured by the Social-DualCVAE of the embodiments of the present application compared to Social-STGCNN. As shown in scenario 2, individual No. 3 in the Social-DualCVAE attempts to avoid a collision with individual No. 4. The Social-DualCVAE also produces a smaller variance of multi-modal future trajectories, with more focused prediction zones. Furthermore, in Social-DualCVAE, multi-modal predicted trajectories cover more target trajectories, representing better multi-modal prediction accuracy.
In order to better understand social behaviors through social interaction classification, the present embodiment further visualizes typical social interactions of each class in the unsupervised classification result in the Zara1 scene, where the typical social interactions include 4 classes (class) in total, and refer to fig. 9, which is a further schematic diagram of the prediction result in the present embodiment, where a value of the hyper-parameter H is 4. Of the 38382 social interactions that exist in the Zara1 scenario, 25519 instances belong to category 1, which mainly include parallel walking or parallel trends in the same direction. 5263 instances belong to class 2. The predicted regions of most trajectories in class 2 do not overlap, which means that there is little or no interaction between pedestrians. 1854 examples belong to category 3, which mainly involve co-directional social interactions with a tendency to catch up or cross over. 5746 instances belong to class 4. Most trajectories in class 4 have opposite directions, with more overlap, representing some likelihood of meeting or collision. Fig. 9 shows that there are significant characteristic differences between different social interaction categories, and it can be verified that the pedestrian trajectory prediction method of the embodiment of the application performs well in a wide range of scenes with various social interactions.
In addition, an embodiment of the present application further provides a pedestrian trajectory prediction apparatus, and referring to fig. 10, the apparatus includes:
a pedestrian historical track data acquisition module 1010, configured to acquire pedestrian historical track data;
a pedestrian trajectory feature vector obtaining module 1020, configured to input pedestrian historical trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector;
the interaction type latent variable information acquisition module 1030 is used for inputting the pedestrian trajectory feature vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information;
and the pedestrian trajectory prediction module 1040 is configured to input the interaction type latent variable information, the pedestrian trajectory feature vector and the pedestrian historical trajectory data into a pre-trained behavior prediction model to obtain a pedestrian trajectory prediction probability distribution result.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
It should be noted that the pedestrian trajectory prediction apparatus in the present embodiment may execute the pedestrian trajectory prediction method in the embodiment shown in fig. 2. That is, the pedestrian trajectory prediction apparatus in the present embodiment and the pedestrian trajectory prediction method in the embodiment shown in fig. 2 belong to the same inventive concept, so that these embodiments have the same implementation principle and technical effect, and are not described in detail here.
In addition, an embodiment of the present application further provides a computer device, where the computer device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor.
The processor and memory may be connected by a bus or other means.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions required to implement the pedestrian trajectory prediction method of the above-described embodiment are stored in the memory, and when executed by the processor, perform the pedestrian trajectory prediction method of the above-described embodiment, e.g., perform the method steps S110 to S140 and the like in fig. 2 described above.
Furthermore, an embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions, which are executed by a processor or a controller, for example, by a processor in the above-mentioned computer device embodiment, and can make the above-mentioned processor execute the pedestrian trajectory prediction method in the above-mentioned embodiment, for example, execute the above-mentioned method steps S110 to S140 in fig. 2, and so on.
For another example, when executed by one of the processors in the above-mentioned computer apparatus embodiment, the processor may be caused to execute the pedestrian trajectory prediction method in the above-mentioned embodiment, for example, execute the above-mentioned method steps S110 to S140 in fig. 2, and so on.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described in detail, it will be understood, however, that the present invention is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims (10)

1. A pedestrian trajectory prediction method is characterized by comprising the following steps:
acquiring pedestrian historical track data;
inputting the pedestrian historical track data into a pre-trained feature vector coding model to obtain a pedestrian track feature vector;
inputting the pedestrian track characteristic vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information;
and inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result.
2. The pedestrian trajectory prediction method according to claim 1, wherein the feature vector coding model includes a self-coding module, a social coding module and a feature vector time module, and the inputting the pedestrian history trajectory data into a pre-trained feature vector coding model to obtain a pedestrian trajectory feature vector includes:
inputting the pedestrian historical track data into the pre-trained self-coding module to obtain self-coding feature vectors;
inputting the pedestrian historical track data into the pre-trained social coding module and the feature vector time module to obtain a social coding feature vector;
and inputting the self-coding feature vector and the social coding feature vector into the feature vector time module to obtain the pedestrian trajectory feature vector.
3. The pedestrian trajectory prediction method of claim 1, wherein the interaction type estimation model comprises: the method comprises a social interaction classification model and a latent variable estimation model, wherein the pedestrian trajectory feature vector is input into a pre-trained interaction type estimation model to obtain interaction type latent variable information, and the method comprises the following steps:
inputting the pedestrian track characteristic vector into the social interaction classification model to obtain an interaction type distribution result;
and inputting the interaction type distribution result into the latent variable estimation model to obtain the interaction type latent variable information.
4. The method according to claim 3, wherein the social interaction classification model is a sensor classification model, and the inputting the pedestrian trajectory feature vector into the social interaction classification model to obtain an interaction type distribution result comprises:
inputting the pedestrian trajectory feature vector into the sensor classification model to obtain an interaction type distribution result;
and carrying out random gradient estimation on the interaction type distribution result to obtain the interaction type distribution result.
5. The pedestrian trajectory prediction method according to any one of claims 3 to 4, wherein before the obtaining of the pedestrian historical trajectory data, further comprising:
obtaining training data, the training data comprising: pedestrian historical trajectory training data and pedestrian future trajectory training data;
inputting the pedestrian historical track training data and the pedestrian future track training data into the feature vector coding model to obtain a pedestrian historical track feature vector and a pedestrian future track feature vector;
inputting the pedestrian historical track feature vector into the social interaction classification model to obtain a prior interaction type distribution result;
inputting the pedestrian future trajectory feature vector into the social interaction classification model to obtain a posterior interaction type distribution result;
and inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model, and minimizing a latent variable loss function to obtain the trained latent variable estimation model.
6. The pedestrian trajectory prediction method according to claim 5, further comprising:
and according to the prior interaction type distribution result and the posterior interaction type distribution result, minimizing a social interaction classification loss function to obtain the trained social interaction classification model.
7. The method according to claim 6, wherein the step of inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model to minimize a latent variable loss function to obtain the trained latent variable estimation model comprises:
inputting the prior interaction type distribution result and the posterior interaction type distribution result into the latent variable estimation model, and obtaining prior interaction type latent variable information and posterior interaction type latent variable information by using a reparameterization mode;
calculating a latent variable loss function according to the prior interaction type latent variable information and the posterior interaction type latent variable information;
and minimizing the latent variable loss function to obtain the trained latent variable estimation model.
8. A pedestrian trajectory prediction device characterized by comprising:
the pedestrian historical track data acquisition module is used for acquiring pedestrian historical track data;
the pedestrian track characteristic vector acquisition module is used for inputting the pedestrian historical track data into a pre-trained characteristic vector coding model to obtain a pedestrian track characteristic vector;
the interaction type latent variable information acquisition module is used for inputting the pedestrian track characteristic vector into a pre-trained interaction type estimation model to obtain interaction type latent variable information;
and the pedestrian track prediction module is used for inputting the interaction type latent variable information, the pedestrian track characteristic vector and the pedestrian historical track data into a pre-trained behavior prediction model to obtain a pedestrian track prediction probability distribution result.
9. A computer device comprising a processor and a memory;
the memory is used for storing programs;
the processor is configured to execute the pedestrian trajectory prediction method according to any one of claims 1 to 7 in accordance with the program.
10. A computer-readable storage medium storing computer-executable instructions for performing the pedestrian trajectory prediction method of any one of claims 1 to 7.
CN202111324745.XA 2021-11-10 2021-11-10 Pedestrian trajectory prediction method, device, equipment and storage medium Pending CN114155270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111324745.XA CN114155270A (en) 2021-11-10 2021-11-10 Pedestrian trajectory prediction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111324745.XA CN114155270A (en) 2021-11-10 2021-11-10 Pedestrian trajectory prediction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114155270A true CN114155270A (en) 2022-03-08

Family

ID=80459642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111324745.XA Pending CN114155270A (en) 2021-11-10 2021-11-10 Pedestrian trajectory prediction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114155270A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418038A (en) * 2022-03-29 2022-04-29 北京道达天际科技有限公司 Space-based information classification method and device based on multi-mode fusion and electronic equipment
CN114792320A (en) * 2022-06-23 2022-07-26 中国科学院自动化研究所 Trajectory prediction method, trajectory prediction device and electronic equipment
CN115345390A (en) * 2022-10-19 2022-11-15 武汉大数据产业发展有限公司 Behavior trajectory prediction method and device, electronic equipment and storage medium
CN115690924A (en) * 2022-12-30 2023-02-03 北京理工大学深圳汽车研究院(电动车辆国家工程实验室深圳研究院) Potential user identification method and device for unmanned vehicle

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418038A (en) * 2022-03-29 2022-04-29 北京道达天际科技有限公司 Space-based information classification method and device based on multi-mode fusion and electronic equipment
CN114792320A (en) * 2022-06-23 2022-07-26 中国科学院自动化研究所 Trajectory prediction method, trajectory prediction device and electronic equipment
CN115345390A (en) * 2022-10-19 2022-11-15 武汉大数据产业发展有限公司 Behavior trajectory prediction method and device, electronic equipment and storage medium
CN115690924A (en) * 2022-12-30 2023-02-03 北京理工大学深圳汽车研究院(电动车辆国家工程实验室深圳研究院) Potential user identification method and device for unmanned vehicle

Similar Documents

Publication Publication Date Title
Liu et al. STMGCN: Mobile edge computing-empowered vessel trajectory prediction using spatio-temporal multigraph convolutional network
CN114155270A (en) Pedestrian trajectory prediction method, device, equipment and storage medium
Razmjooy et al. A hybrid neural network Imperialist Competitive Algorithm for skin color segmentation
CN110110707A (en) Artificial intelligence CNN, LSTM neural network dynamic identifying system
CN109872346B (en) Target tracking method supporting cyclic neural network counterstudy
CN106951923B (en) Robot three-dimensional shape recognition method based on multi-view information fusion
CN112906858A (en) Real-time prediction method for ship motion trail
Zhao et al. Stacked multilayer self-organizing map for background modeling
Yu et al. Human action recognition using deep learning methods
US11789466B2 (en) Event camera based navigation control
Khosravi et al. Crowd emotion prediction for human-vehicle interaction through modified transfer learning and fuzzy logic ranking
Ratre Taylor series based compressive approach and Firefly support vector neural network for tracking and anomaly detection in crowded videos
Bamaqa et al. Anomaly detection using hierarchical temporal memory (HTM) in crowd management
US20220130109A1 (en) Centralized tracking system with distributed fixed sensors
Chondrodima et al. An efficient LSTM neural network-based framework for vessel location forecasting
Li et al. Self-attention pooling-based long-term temporal network for action recognition
Zernetsch et al. A holistic view on probabilistic trajectory forecasting–case study. cyclist intention detection
CN108280408B (en) Crowd abnormal event detection method based on hybrid tracking and generalized linear model
Hamed et al. An extended evolving spiking neural network model for spatio-temporal pattern classification
Murray et al. Deep representation learning-based vessel trajectory clustering for situation awareness in ship navigation
Hakim et al. Optimization of the Backpropagation Method with Nguyen-widrow in Face Image Classification
Yalçın Weather parameters forecasting with time series using deep hybrid neural networks
Kalirajan et al. Deep Learning for Moving Object Detection and Tracking
Bosire et al. Using Deep Analysis of Driver Behavior for Vehicle Theft Detection and Recovery
Yugendar et al. Analysis of crowd flow parameters using artificial neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination