CN113888638A

CN113888638A - Pedestrian trajectory prediction method based on attention mechanism and through graph neural network

Info

Publication number: CN113888638A
Application number: CN202111171633.5A
Authority: CN
Inventors: 曹云依; 杨欣; 陈思哲; 朱义天; 李恒瑞; 周大可
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-01-04

Abstract

The invention discloses a pedestrian trajectory prediction method based on a graph neural network of an attention mechanism, which mainly comprises the following steps: acquiring pedestrian track information, extracting track motion characteristics, and constructing a pedestrian track original node map; fusing, abandoning and amplifying the pedestrian track original node map to generate a pedestrian track final node map; extracting the spatiotemporal characteristics of the final node graph of the pedestrian track by using a spatiotemporal graph convolutional neural network, constructing an original spatiotemporal characteristic graph of the pedestrian track according to the spatiotemporal characteristics, and screening important spatiotemporal characteristics by using a graph channel attention mechanism to form a new spatiotemporal characteristic graph; inputting the new space-time characteristic diagram into a predictor, and outputting a predicted pedestrian track within preset time by the predictor; and assigning a weight to the predicted pedestrian track, and taking the track with the maximum weight as a final prediction result. According to the method, effective information is maximized by paying more attention to the characteristics influencing the final result, and the pedestrian trajectory prediction precision is effectively improved.

Description

Pedestrian trajectory prediction method based on attention mechanism and through graph neural network

Technical Field

The invention relates to the technical field of trajectory planning, in particular to a pedestrian trajectory prediction method based on a graph neural network of an attention mechanism.

Background

Incorrect driving behavior not only causes economic losses, but also seriously jeopardizes the safety of one's own and others' lives. Thus, autopilot technology has considerable promise for its safety and intelligence. Pedestrian trajectory prediction is a key issue for automated driving techniques. In automatic driving, accurate prediction of pedestrian trajectories can enable decision-making mechanisms to plan paths in advance, and reduce the possibility of traffic congestion and accidents. However, it is difficult to achieve sufficient accuracy in the prediction task due to uncertainty in the pedestrian's own behavior and complex interaction with the environment.

In recent years, many methods have been proposed for trajectory prediction: the initial track prediction task usually uses a constant-speed linear predictor, but the prediction method has the defect that the method can only be used for predicting a scene running in a constant-speed straight line. If the pedestrian motion trajectory to be predicted is complex, the method cannot be applied. Later conventional methods often used social force models representing the pedestrian's attraction and repulsion forces to predict pedestrian and environmental interaction. But the model relies on manual design features, and if the environment is complex, the model is difficult to express the implicit interactive behaviors at the moment. Other prediction methods include the Markov decision process, the Gaussian mixture model, and are tedious and old in performance.

With the development of neural networks in recent years, in the field of trajectory prediction, a trajectory prediction task is often completed by training the neural networks. RNN and its variants LSTM, GRU are gradually applied to track prediction due to the capability of time sequence feature extraction, and obtain better effect. Although the recurrent neural network can achieve a very good effect in the aspect of extracting the features of the Euclidean spatial data, most of the data of the trajectory prediction task is non-Europe data. Therefore, recurrent neural networks do not handle non-euclidean spatial data well due to the lack of advanced spatio-temporal views. And the graph structure can express the interaction between pedestrians more intuitively and effectively. The graph is an irregular structure, and contains a variable number of nodes, and interaction between the nodes can occur or not occur. Thus, the graph neural network can capture the interrelationships between instances, as compared to a conventional neural network. The existing interaction technology usually focuses on interaction relation in space more, ignores time correlation, and can adopt a space-time diagram neural network to focus on the space-time correlation more and finish a track prediction task more efficiently.

GUPTA et al propose a Social GAN model that defines a spatial pool for motion prediction. Meanwhile, a novel pool mechanism is provided, the network can generate a track meeting the social standard by learning past historical track codes, but the problem of too low convergence speed easily occurs in the neural network based on the GAN model.

In addition, research on a Social-BIGAT model has also appeared, which introduces a graph attention network to improve modeling of Social interaction between pedestrians in a scene, allows all pedestrians in the scene to interact, and encourages generalization of multi-mode distribution by constructing a reversible mapping between output trajectories and behaviors representing pedestrians in the scene, so that a larger multi-peak trajectory distribution can be learned, but the characteristic of the Social-BIGTA model is insufficient, resulting in unexpected effect of pedestrian trajectory prediction.

Disclosure of Invention

In order to solve the problem of low pedestrian track prediction precision in the automatic driving technology, the invention provides a pedestrian track prediction method based on a graph neural network of an attention mechanism.

The technical scheme provided by the invention is as follows:

s1: collecting pedestrian track information, extracting track motion characteristics, and constructing a pedestrian track original node diagram, wherein the pedestrian track original node diagram contains spatial information and time information of a pedestrian track;

s2: fusing, abandoning and amplifying the pedestrian track original node map, filtering information which has great influence on the formed pedestrian track, and generating a pedestrian track final node map, wherein the fusing, abandoning and amplifying are completed by a map channel attention mechanism;

s3: extracting the spatiotemporal characteristics of the final node graph of the pedestrian track by using a spatiotemporal graph convolutional neural network, constructing an original spatiotemporal characteristic graph of the pedestrian track according to the spatiotemporal characteristics, and screening important spatiotemporal characteristics by using a graph channel attention mechanism to form a new spatiotemporal characteristic graph;

s4: inputting the new space-time characteristic diagram into a predictor, and outputting a predicted pedestrian track within preset time by the predictor, wherein the predictor adopts a time extrapolation neural network, and the predicted pedestrian track comprises a plurality of different results;

s5: and assigning a weight to the predicted pedestrian track, and taking the track with the maximum weight as a final prediction result, wherein the assigning of the weight is completed by a time channel attention mechanism.

Further, the representation of the pedestrian track original node map is as follows:

Gt＝(V_t,E_t) Wherein V is_tInformation on nodes of the pedestrian locus diagram representing the t-th frame, E_tAnd representing relevant information of edges of the pedestrian trajectory graph of the t-th frame, wherein the edges represent the interactive relation among the nodes.

Further, when an edge exists between two nodes, a weight value is assigned to the edge

The weight value represents the strength of interaction between nodes and is generated by a kernel function, wherein the kernel function is as follows:

wherein the content of the first and second substances,

representing the distance between a node i and a node j in the pedestrian locus diagram of the t-th frameThe euclidean distance.

Further, the graph channel attention mechanism in S2 screens the structural information of the pedestrian trajectory original node graph using a graph convolution channel attention module, where the graph convolution channel attention module focuses more on the relationship between channels, identifies and screens out important channel features through a learning mechanism, and suppresses unimportant channel features.

Further, the step of extracting the spatiotemporal features of the final node map of the pedestrian track by using the spatiotemporal graph convolutional neural network described in the step S3 requires that the adjacency matrix a is connected with the spatiotemporal feature of the final node map of the pedestrian track_tNormalization, i.e. ordering

Wherein the adjacency matrix A_tWeight of edge

Set of (A)_tIs a diagonal matrix.

Further, the space-time graph convolutional neural network defined by S3 has the following expression of the convolution operation on the plane graph or the feature graph:

wherein σ represents an activation function, k represents the size of the kernel, p represents a sampling function, and l represents the number of layers; based on the expression of the convolution operation of the space-time graph convolution neural network on the plane graph or the characteristic graph, the convolution operation expression of the pedestrian track final node graph is as follows:

wherein the content of the first and second substances,

is a normalized term, B (v)ⁱ)＝{vⁱ|d(vⁱ,v^j) ≦ D } is the neighbor set for the node.

Further, S4 the predictor performs a convolution operation on the time dimension of the new spatiotemporal feature map.

Further, the time channel attention mechanism described in S5 considers not only the channel itself but also the adjacent channels of the channel, and completes the calculation of the channel weight through one-dimensional convolution, where the number of the adjacent channels is related to the size of the channel, and an exponential family function is used to process the mapping problem of the number of the adjacent channels and the size of the channel, that is, C ═ Φ (k) ≈ 2^ (γ × k-b), where k represents the number of the adjacent channels and C represents the size of the channel, then the adaptive algorithm expression of k is as follows:

further, in a plane space, assuming that a random variable of a probability distribution of a position where a pedestrian appears at a certain time follows a bivariate normal distribution, a loss function of a graph neural network based on an attention mechanism is as follows:

wherein V contains all trainable parameters in the graph neural network,

a mean value of a probability distribution representing a position where a pedestrian is likely to appear at a certain time,

the variance of the distribution is represented by the variance of the distribution,

representing the relevance of the distribution.

Advantageous effects

According to the pedestrian trajectory prediction method based on the attention mechanism in the graph neural network, the spatial correlation and the temporal correlation are simultaneously focused in the interaction technology, on one hand, important space-time characteristics are screened out by using the graph channel attention mechanism, on the other hand, weights are distributed to all possible trajectories through the time channel attention mechanism, and the method maximizes effective information by focusing more on the characteristics influencing final results, so that the pedestrian trajectory prediction precision is effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart illustrating steps of a pedestrian trajectory prediction method of a neural network based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a neural network based on an attention mechanism in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the operating principle of the graph channel convolution attention mechanism (GCA) in an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the operation of the time-channel convolution attention mechanism (TCA) in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a pedestrian trajectory prediction result of the neural network based on attention mechanism in the embodiment of the invention when pedestrians are in a state of walking in opposite directions;

FIG. 6 is a schematic diagram of a pedestrian trajectory prediction result of the neural network based on attention mechanism in a state where pedestrians are traveling in the same direction in the embodiment of the present invention.

Detailed Description

The technical scheme of the invention will be explained in detail below by combining with the attached drawings in the embodiment of the invention, and the scheme of the invention is better described through the embodiment.

The embodiment provides a pedestrian trajectory prediction method based on a graph neural network of an attention mechanism, which comprises the following steps of:

the pedestrian trajectory information used in this embodiment is derived from the public human trajectory data sets UCY and ETH, specifically, UCY has three scenes therein, named UNIV, ZARA1, and ZARA 2; there are two scenarios within ETH, named ETH, HOTEL, respectively.

The Average Displacement Error (Average Displacement Error), the Final Displacement Error (Final Displacement Error) were used as the performance of the evaluation network.

Referring to fig. 1, the pedestrian trajectory prediction method of the neural network based on attention mechanism in this embodiment includes the following specific steps:

s1: extracting motion characteristics of the pedestrian track sequence by using a long-short term memory neural network (LSTM) to generate a corresponding pedestrian track original node map;

specifically, a training set of a human trajectory data set is input into an LSTM neural network, a vector output by the network is used as a motion feature, and in a t-th frame, a graph embedded with pedestrian trajectory space and time information can be represented as: gt ═ V_t,E_t) Wherein V is_tInformation on nodes of the pedestrian locus diagram representing the t-th frame, E_tRepresenting relevant information of edges of the pedestrian trajectory graph of the t-th frame, wherein the edges represent interactive relations among nodes; when an edge exists between two nodes, a weight value is assigned to the edge

wherein the content of the first and second substances,

and representing the Euclidean distance between the node i and the node j in the pedestrian locus diagram of the t-th frame.

S2: fusing, abandoning and amplifying the pedestrian track original node map by adopting a map channel convolution attention mechanism (GCA), filtering out information which has great influence on the formed pedestrian track, and generating a pedestrian track final node map;

specifically, the graph channel attention mechanism adopts a graph convolution channel attention module to screen the structural information of the pedestrian track original node graph, the graph convolution channel attention module focuses more on the relationship between channels, and identifies and screens out important channel features through a learning mechanism to inhibit unimportant channel features;

referring to fig. 3, the pedestrian track original node map is subjected to global mean pooling firstly, then passes through two full-connection layers and finally passes through one activation layer, and the activation function is a Sigmoid function and is subjected to nonlinear activation to obtain a pedestrian track final node map.

S3: extracting the spatiotemporal characteristics of a final node graph of the pedestrian track by using a spatiotemporal graph convolutional neural network (STGCN), constructing an original spatiotemporal characteristic graph of the pedestrian track according to the spatiotemporal characteristics, and screening important spatiotemporal characteristics by using a graph channel attention mechanism to form a new spatiotemporal characteristic graph;

specifically, extracting the spatiotemporal characteristics of the final node map of the pedestrian track by the spatiotemporal graph convolutional neural network requires that the adjacency matrix A is connected with the spatiotemporal characteristics of the final node map of the pedestrian track_tNormalization, i.e. ordering

Wherein the adjacency matrix A_tIs the weight of the edge

Set of (A)_tIs a diagonal matrix;

the expressions of convolution operation defined by the space-time graph convolution neural network on a plane graph or a characteristic graph are as follows:

wherein the content of the first and second substances,

S4: inputting the new spatiotemporal feature map into a predictor, and outputting a predicted pedestrian track within preset time by the predictor, wherein the predictor adopts a time extrapolation neural network (TECNN), and the predicted pedestrian track comprises a plurality of possible results;

specifically, the predictor performs a convolution operation on the time dimension of the new spatiotemporal feature map.

S5: a time channel attention mechanism assigns a weight to the predicted pedestrian trajectory, and takes the trajectory with the maximum weight as a final prediction result;

specifically, the channel attention mechanism considers not only the channel itself but also the adjacent channels of the channel, and completes the calculation of the channel weight through one-dimensional convolution, wherein the number of the adjacent channels is related to the size of the channel, and an exponential family function is adopted to process the mapping problem of the number of the adjacent channels and the size of the channel, i.e. C ═ Φ (k) ≈ 2^ (γ × k-b), wherein k represents the number of the adjacent channels, C represents the size of the channel, and then the adaptive algorithm expression of k is as follows:

referring to fig. 4, in the present embodiment, 5 adjacent channels around a channel are selected, a one-dimensional (1D) convolution is used to replace a full connection layer, and finally, a layer of activation layer is used, and an activation function is a Sigmoid function to perform nonlinear activation;

in the plane space, assuming that the random variable of the probability distribution of the pedestrian appearance position at the time t is set as (xt, yt), and assuming that a bivariate normal distribution is followed, the loss function of the graph neural network based on the attention mechanism is as follows:

wherein V contains all trainable parameters in the graph neural network,

representing the relevance of the distribution.

S6: the pedestrian trajectory series batch size was set to 256, the initial learning rate (lr) was set to 0.01, the epoch was set to 500, and the learning rate became 0.002 after 300 epochs.

S7: the selection of trajectories was made every 0.4s in the human trajectory dataset and compared to other models.

In order to visually display the prediction effect, according to the method of the present invention, the real trajectory, the observed trajectory and the predicted trajectory are marked in the graph by taking the data set prediction result of UCY-Zara02 as an example. In the embodiment, two states of pedestrians in a data set are selected for trajectory prediction, referring to fig. 5, a line in the figure indicates a trajectory of a pedestrian, an arrow direction of the line indicates a walking direction of the pedestrian, the pedestrians in the figure are in a state of walking in opposite directions, a trajectory 11 indicated by a black solid line in the figure is a history trajectory of a first pedestrian, a trajectory 12 indicated by a black dotted line is a real trajectory of the first pedestrian, a trajectory 13 indicated by a white dotted line is a predicted trajectory of the first pedestrian output by a neural network of a graph based on an attention mechanism in the invention, a trajectory 21 indicated by a black solid line is a history trajectory of a second pedestrian, a trajectory 22 indicated by a black dotted line is a real trajectory of the second pedestrian, a trajectory 23 indicated by a white dotted line is a predicted trajectory of the second pedestrian output by the neural network of the graph based on the attention mechanism in the invention, and so on; referring to fig. 6, the rows in the figure are in a state of walking in the same direction; the prediction results of the pedestrians in different walking states are analyzed, and the future track of the pedestrian can be accurately predicted by the method provided by the invention.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed, the program includes some or all of the steps of any attention-based neural network pedestrian trajectory prediction method described in the above method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory comprises: a U-disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, Read-only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. A pedestrian trajectory prediction method based on a graph neural network of an attention mechanism is characterized by comprising the following steps of:

2. The attention mechanism-based graph neural network pedestrian trajectory prediction method of claim 1, wherein the representation of the pedestrian trajectory raw node graph is:

3. The attention mechanism-based pedestrian trajectory prediction method of a neural network of a map as claimed in claim 2, wherein:

when an edge exists between two nodes, a weight value is assigned to the edge

wherein the content of the first and second substances,

4. The pedestrian trajectory prediction method of the graph neural network based on the attention mechanism as claimed in claim 1, wherein the graph channel attention mechanism of S2 screens the structure information of the pedestrian trajectory original node graph by using a graph convolution channel attention module, wherein the graph convolution channel attention module focuses more on the relationship between channels, identifies and screens out important channel features through a learning mechanism, and suppresses unimportant channel features.

5.The method for predicting pedestrian trajectories through graph neural network based on attention mechanism as claimed in claim 1, wherein the step of extracting the spatiotemporal features of the final node map of the pedestrian trajectories through the spatiotemporal graph convolutional neural network at S3 requires that the adjacency matrix A is connected with the spatiotemporal features_tNormalization, i.e. ordering

Wherein the adjacency matrix A_tWeight of edge

Set of (A)_tIs a diagonal matrix.

6. The pedestrian trajectory prediction method based on the attention mechanism graph neural network as claimed in claim 1, wherein the expression of the convolution operation defined by the space-time graph convolution neural network on the plan view or the feature graph at S3 is as follows:

wherein the content of the first and second substances,

7. The attention mechanism-based graph neural network pedestrian trajectory prediction method of claim 1, wherein the predictor of S4 convolves the time dimension of the new spatiotemporal feature map.

8. The pedestrian trajectory prediction method of the neural network based on attention mechanism in the graph as claimed in claim 1, wherein the time channel attention mechanism S5 considers not only the channel itself but also the adjacent channels of the channel, and the computation of the channel weight is completed by one-dimensional convolution, wherein the number of the adjacent channels is related to the size of the channel, and the exponential family function is used to process the mapping problem of the number of the adjacent channels and the size of the channel, i.e. C ═ Φ (k) ≈ 2^ (γ ^ k-b), where k represents the number of the adjacent channels, and C represents the size of the channel, then the adaptive algorithm expression of k is as follows:

9. the attention mechanism-based graph neural network pedestrian trajectory prediction method of claim 1, characterized in that:

in planar space, assuming that the random variable of the probability distribution of the positions where pedestrians appear at a certain time follows a bivariate normal distribution, the loss function of the graph neural network based on the attention mechanism is as follows:

wherein V contains all trainable parameters in the graph neural network,

representing the relevance of the distribution.

10. A computer-readable storage medium containing the method for predicting pedestrian trajectories of the attention-based graphical neural network according to any one of claims 1 to 9.