CN115082896A

CN115082896A - Pedestrian trajectory prediction method based on topological graph structure and depth self-attention network

Info

Publication number: CN115082896A
Application number: CN202210741506.2A
Authority: CN
Inventors: 孔令悦; 孙长银; 王远大
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-09-20

Abstract

A pedestrian track prediction method based on a topological graph structure and a deep self-attention network extracts local and global space interaction features in a pedestrian motion track respectively by using a graph attention network and the deep self-attention network based on the topological graph, and then extracts time sequence features by using an original deep self-attention network. In order to simulate the inherent uncertainty and the multi-modal characteristics of the pedestrian motion trajectory, the invention expands the exploration space of the pedestrian motion trajectory by introducing Gaussian noise into a fully-connected network decoder. In order to further improve the track exploration space and smoothness, the track is sent to a track correction module for correction. Compared with other methods, the adopted graph neural network and the depth self-attention network based on the graph can pay more attention to various spatial interaction modes in the pedestrian motion trail, such as parallelism, potential obstacle avoidance and the like. Compared with other pedestrian trajectory prediction methods, the social interaction feature extraction capability and the multi-modal exploration capability of the invention are more prominent and effective.

Description

Pedestrian trajectory prediction method based on topological graph structure and depth self-attention network

Technical Field

The invention belongs to the field of deep learning and automatic driving, and particularly relates to a pedestrian trajectory prediction method based on a topological graph structure and a deep self-attention network.

Background

Under a complex crowd scene, the participation of pedestrians brings huge challenges to the dynamic obstacle avoidance motion planning of the mobile robot and the unmanned vehicle, the prediction of the pedestrian motion trail is beneficial to improving the efficiency of the obstacle avoidance planning of the unmanned vehicle, and the safety accident rate is reduced. The traditional pedestrian trajectory prediction method mainly adopts statistical probability methods such as hidden Markov chains, Bayes and the like, or artificially set rules and functions, and the methods are difficult to migrate to a complex nonlinear environment in a scene which is often applied to a sparse crowd environment and has poor motion state randomness, and once complex social interaction among pedestrians occurs, the effectiveness of a prediction result is difficult to ensure. With the development of neural networks, methods for converting trajectory prediction tasks into time sequence generation tasks have been proposed (Alahi A, Goel K, Ramanathan V, et al. The method has the core idea that a cyclic neural network such as a long-term and short-term memory network is adopted to extract time sequence information in the pedestrian tracks, and spatial interaction information among the pedestrian tracks is simulated in a pooling mode for hidden states in the network, but the method has insufficient capability of extracting spatial interaction characteristics among pedestrians. With the development of deep learning technology, particularly a graph neural network and a self-attention mechanism, a method for simulating pedestrian interaction by using the graph neural network and the attention mechanism becomes possible. The existing better track prediction method utilizes a Sparse graph convolution network (Shi L, Wang L, Long C, et al SGCN: Sparse graph context network for pedestrian prediction [ C ]// CVPR,2021.), although the method can simulate interaction information among pedestrians to a certain extent, the time sequence feature extraction capability is insufficient, more importantly, the method cannot effectively simulate multi-modal features in the pedestrian motion track, and the inherent uncertainty in the pedestrian motion track is particularly important in a track prediction task.

Disclosure of Invention

In order to overcome the defects of the prior art, balance the relation between time sequence feature extraction and space interaction feature extraction and meet the inherent randomness requirement of the pedestrian track, the invention provides a pedestrian track prediction method based on a topological graph structure and a depth self-attention network, extracts time sequence features in historical tracks, explores interaction information and behavior modes among pedestrians, restores multi-modal characteristics in the pedestrian motion track, expands the exploration space of track prediction and predicts future track points of the pedestrians.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the pedestrian trajectory prediction method based on the topological graph structure and the depth self-attention network comprises the following steps:

preprocessing data to meet the requirement of a neural network on input data, and training and testing model parameters by adopting a leave-one cross verification method;

step two, after the original pedestrian track data in the step one are obtained, embedding the data into a high-dimensional space by using a full-connection network, constructing a topological graph structure to meet the input requirement of a space interactive feature encoder, sending the high-dimensional data into the space interactive feature encoder in order to fully extract space interactive features, obtaining the high-dimensional data with the space interactive features, and splicing the output data of the two networks by using the full-connection network to ensure that the dimensionality of the output data is consistent with the original high-dimensional data;

thirdly, splicing the high-dimensional data with the space interaction characteristics obtained in the second step and the original high-dimensional data, and sending the spliced high-dimensional data into a time sequence characteristic encoder to extract time sequence characteristics;

two self-attention mechanisms are adopted to respectively extract global and local space interaction information among pedestrian tracks, interaction information and behavior patterns among pedestrians are fully explored, wherein the graph attention network adopts a basisRelative distance to neighbor related parameters

The attention mechanism of (1):

wherein (x) _i ，y _i ) Is a two-dimensional spatial location coordinate point of the pedestrian i,

for a fully-connected network embedding function, W is a network matrix parameter, the part of attention mainly emphasizes the influence of relative distance between local neighbors on a motion track, and a depth self-attention network based on a graph adopts a self-attention mechanism to emphasize the influence of a global relation on the motion track;

step four, in order to simulate the inherent uncertainty and the multi-modal characteristics of the pedestrian motion trajectory, Gaussian sampling noise is introduced into the high-dimensional data which is obtained in the step three and has space interaction characteristics and time sequence characteristics at the same time, and then the data are sent into a fully-connected neural network decoder to obtain a predicted pedestrian trajectory sequence;

and step five, sending the result to a trajectory correction module to improve the smoothness and continuity of the path.

As a further improvement of the present invention, in the second step, the spatial interaction feature encoder is used for encoding the high-dimensional data including a graph attention network and a graph-based deep self-attention network, and in the second step, a spatial interaction feature encoder (including a graph attention network and a topological graph structure-based deep self-attention network) is used for performing spatial interaction feature extraction on the pedestrian trajectory data.

As a further improvement of the invention, in the third step, a time sequence feature encoder (comprising an original depth self-attention network) is used for extracting the time sequence features in the pedestrian track data.

As a further improvement of the invention, in the fourth step, inherent uncertainty and multi-modal characteristics in the pedestrian motion trajectory are simulated by introducing gaussian sampling noise into the data, and the exploration space of the pedestrian trajectory is expanded.

As a further improvement of the present invention, in the step five, a curve fitting and a binary network are adopted to further expand the network exploration space and enhance the curve continuity, wherein the curvature S calculation formula is as follows:

compared with the prior art, the invention has the beneficial effects that: (1) two self-attention mechanisms are adopted, including a graph attention network and a depth self-attention network based on a graph, space interaction characteristics are extracted, the influences of the relative distance and the global relation of local adjacent pedestrians on the pedestrian tracks are respectively extracted, social interaction information and various interaction modes in the pedestrian tracks are more fully explored and identified, and various complex interaction situations such as side-by-side walking among pedestrians, detouring in advance and avoiding potential collision among pedestrians and grouped pedestrians by a single walking surface are avoided; (2) time sequence features in the pedestrian historical track are extracted by adopting a deep self-attention network, and the extraction capability of the model on the time sequence features is enhanced by utilizing a mask mechanism (Masked) in the deep self-attention network; (3) gaussian noise is introduced into the multi-mode encoder to simulate multi-mode characteristics and inherent uncertainty in a pedestrian motion track, and the exploration space of the pedestrian motion track is expanded to be closer to the practical situation. (4) And sending the final position point into a correction module, and further improving the track exploration space and continuity.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 illustrates a trajectory prediction network framework according to the present invention;

FIG. 3 illustrates a pedestrian topology of the present invention;

FIG. 4 illustrates a pedestrian candidate trajectory generation module of the present invention;

FIG. 5 illustrates a two-class network of the present invention.

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

the invention provides a pedestrian trajectory prediction method based on a topological graph structure and a deep self-attention neural network, which comprises the following specific processes of data preprocessing, deep neural network model construction, model parameter training and pedestrian trajectory prediction, wherein the specific processes are shown in figure 1.

In the data preprocessing stage, the pedestrian data are preprocessed, the pedestrian track is converted into coordinate points in a space two-dimensional coordinate, time sequence information is coded, the two-dimensional space data are expanded to a high-dimensional space to meet the requirement of deep self-attention network input, and a topological graph structure is constructed by utilizing the pedestrian track data to meet the requirement of graph neural network input.

In the deep neural network construction stage, a machine learning library Pythrch and the like are utilized to construct a corresponding deep neural network framework. Two encoders and one decoder in a deep learning framework are built, and the two encoders and the one decoder comprise a spatial feature extraction encoder (comprising a graph attention network and a graph-based deep self-attention network), a time sequence feature extraction encoder (comprising a deep self-attention network) and a multi-modal decoder (comprising a fully-connected network).

In the model parameter training stage, network hyper-parameters and loss functions are set, network model parameters are trained by adopting a leave-one-out cross verification method, and the model is evaluated by adopting two evaluation indexes of average position deviation and final position deviation.

And in the track correction stage, generating a plurality of candidate tracks by using the final position points, and sending the candidate tracks into a two-classification network for training and evaluating the tracks.

And in the pedestrian trajectory prediction stage, inputting a pedestrian motion trajectory sequence to be predicted, and predicting by using a trained trajectory prediction network frame to generate a pedestrian future motion trajectory sequence.

The specific contents of each stage are described in detail as follows:

(1) and in the data preprocessing stage, data needs to be preprocessed to meet the training input requirement of the network model. The raw data generally includes time information t, pedestrian label i and pedestrian spatial location point (x) _i ，y _i ) In order to meet the deep self-attention network input and training requirements, the two-dimensional space coordinate data needs to be mapped into high-dimensional data (32-dimensional data is set in the invention) by using a fully-connected network. Meanwhile, in order to meet the input requirements of the graph attention network and the self-attention network based on the topological graph, pedestrians need to be constructed into a topological graph structure according to spatial position points, and an adjacency matrix N (i) is obtained.

(2) In the model construction stage, a track prediction overall framework needs to be constructed by using a deep learning library, and the overall framework comprises a space interactive feature extraction coder, a time sequence characteristic extraction coder and a multi-modal decoder, and is shown in fig. 2. And a orthotic module was constructed as shown in figures 4 and 5, with the framework shown in figure 1.

The space interactive feature extraction encoder is composed of a graph attention network and a graph-based deep self-attention network, wherein the pedestrian topological graph structure is shown in figure 3. Considering the complex features of social interaction of pedestrians, the complex situation that spatial features of pedestrians are difficult to sufficiently extract simply using a single network, such as side-by-side walking between friends and potential barriers to strangers. Therefore, the invention adopts two network structures of the graph attention network and the depth self-attention network based on the graph to enhance the extraction capability of the global and local space interaction characteristics in the pedestrian track, wherein the graph attention network adopts the correlation coefficient of the relative distance between neighbors

To obtain the influence of local relations on the pedestrian trajectory, correlation coefficients

The calculation method is as follows:

wherein l is the number of iteration layers of the power network,

is the spatial coordinate of the pedestrian i at the time point t, W ^r As an embedded function

The parameter matrix of (2).

Attention can be obtained according to the correlation coefficient

Wherein

For the state value of the pedestrian i at the time point t, the initial input is carried out

N (i) is a contiguous matrix, likewise W ^α As an embedded function

The parameter matrix of (2).

The message passing mechanism of the graph attention network is as follows:

the deep self-attention network based on the graph also adopts a topological graph structure, the input is the same as the graph attention network, and a self-attention mechanism is adopted to emphasize the influence of global interaction information on the self track, firstly, data are input according to the space high dimension

Extracting a query matrix q ⁱ Key matrix k ⁱ Value matrix v ⁱ ：

Graph-based deep self-attention networks also employ message passing mechanisms:

attention calculation mode and output h thereof' _S，i Comprises the following steps:

h' _S,t ＝f _out (Att(i))+Att(i)

wherein d is _k Is the dimension of the matrix, f _out For the output function, here a fully connected network is chosen, as shown in the formula, the output being in a hopping connection.

After obtaining the attention network of the graph and the output of the self-attention network of the depth, respectively, splicing the two outputs by using a full-connection network to keep the two outputs consistent with the original high-dimensional data dimension:

high-dimensional data with space interaction characteristics

Splicing with original high-dimensional data to obtain the input of time series characteristic extraction encoder

The encoder adopts an original depth self-attention network structure, and similarly, a query matrix Q is extracted firstly ⁱ Key matrix K ⁱ Value matrix V ⁱ ：

The self-attention formula is:

in order to extract feature information of different aspects and enhance the feature extraction capability of the network, the invention adopts a multi-head attention mechanism:

wherein the head _j ＝Attention _j (Q ⁱ ,K ⁱ ,V ⁱ ) Is the jth head, f _o The function is output by a fully-connected network, and the purpose is to perform weighted fusion on the multi-head features.

After data of the fusion space interactive characteristic and the time sequence characteristic are obtained, Gaussian sampling noise is added into the data and is input into a multi-mode decoder, and the multi-mode decoder adopts a full-connection network to map high-dimensional data to coordinate points in a two-dimensional space coordinate system so as to output a track sequence.

Then, the known track points and the final position prediction points are sent to a correction module, as shown in fig. three, the correction module firstly collects eight candidate points around the final position prediction points, then generates candidate tracks according to the candidate points by utilizing cubic curve fitting, and the curvature s calculation formula is as follows:

wherein the known final point coordinate is (x) ₁ ，y ₁ ) The candidate point coordinate is (x) ₂ ，y ₂ ) N is the number of neighbors in the topology graph, and curvatureThe point is a point which is on the perpendicular line of the two points and has a distance s from the midpoint of the two points.

And setting all candidate tracks as positive and negative samples according to the average position deviation and the ratio of 1:3, and sending the tracks into a two-classification network for training, wherein the two-classification network is composed of two fully-connected networks.

(3) In the model parameter training stage, after the hyper-parameters and the loss functions are set, the invention adopts two public pedestrian rule data sets ETH and UCY for training, wherein ETH is composed of two small data sets of ETH and Hotel, and UCY is composed of ZARA1, ZARA2 and UNIV. And (4) performing cross validation by adopting a leave-one-out method, namely taking one data set as a test set and taking the other four data sets as training sets.

Or a self-made data set can be adopted, the data set samples the pedestrians in the scene every 0.4 second to serve as one frame of data, 8 frames of data (3.2 seconds) are taken as the historical pedestrian track when model training is carried out, and the pedestrian track sequence of 12 frames (4.8s) in the future is predicted.

The evaluation index adopts an average position deviation ADE and a final position deviation FDE, wherein ADE is an average value of Euclidean distance deviations of future 12 frames of predicted data and actual track data:

wherein N is the predicted pedestrian number, T _p 12 is the maximum number of frames,

in order to be a sequence of actual positions,

to predict the trajectory sequence, | | · | | is the euclidean distance between two points. And FDE is the euclidean distance deviation of the last frame prediction data and the actual trajectory data:

wherein, T _f The final time point.

(4) In the pedestrian track prediction stage, inputting a pedestrian motion track to be predicted into a trained deep network frame, and outputting a corresponding future 12-frame pedestrian motion track by the frame; and dense scenes are complex, social interaction is frequent, multiple curve tracks are formed, and the requirement on the extraction capability of the spatial interaction characteristics among model pedestrians is high. The method can effectively distinguish complex global and local interaction in a dense crowd scene, wherein the complex global and local interaction comprises complex conditions of team parallel walking, potential collision avoidance, independent walking, reasonable social range maintenance with crowds and the like, and the inherent uncertainty and multi-modal characteristics of the pedestrian motion trail are approached to the maximum extent.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims

1. The pedestrian trajectory prediction method based on the topological graph structure and the depth self-attention network is characterized by comprising the following steps of:

splicing the high-dimensional data with the space interaction characteristics obtained in the step two and the original high-dimensional data, and sending the spliced high-dimensional data and the original high-dimensional data into a time sequence characteristic encoder to extract time sequence characteristics;

two self-attention mechanisms are adopted to respectively extract global and local space interaction information between pedestrian tracks and fully explore interaction information and behavior patterns between pedestrians, wherein the graph attention network adopts related parameters based on relative distance of neighbors

The attention mechanism of (1):

for a fully connected network embedding function, W is a network matrix parameter,

is the state value of the pedestrian i at the time point t. The part of attention mainly emphasizes the influence of relative distance between local neighbors on the motion trail, and the depth self-attention network based on the graph adopts a self-attention mechanism to emphasize the influence of global relations on the motion trail;

2. The pedestrian trajectory prediction method based on the topological graph structure and the deep self-attention network according to claim 1, characterized in that: in the second step, the spatial interaction feature encoder encodes the high-dimensional data comprising a graph attention network and a graph-based deep self-attention network.

3. The pedestrian trajectory prediction method based on the topological graph structure and the deep self-attention network according to claim 1, characterized in that: the temporal feature encoder includes a depth self-attention network.

4. The pedestrian trajectory prediction method based on the topological graph structure and the deep self-attention network according to claim 1, characterized in that: in the fourth step, inherent uncertainty and multi-modal characteristics in the pedestrian motion trajectory are simulated by introducing Gaussian sampling noise into data, and the exploration space of the pedestrian trajectory is expanded.

5. The pedestrian trajectory prediction method based on the topological graph structure and the deep self-attention network according to claim 1, characterized in that: and step five, further expanding a network exploration space by adopting curve fitting and a two-classification network, and enhancing the continuity of the curve, wherein the curvature S calculation formula is as follows:

wherein the final point coordinate is (x) ₁ ，y ₁ ) The candidate point coordinate is (x) ₂ ，y ₂ ) And n is the number of neighbors in the topological graph.