CN115527272A - Construction method of pedestrian trajectory prediction model - Google Patents

Construction method of pedestrian trajectory prediction model Download PDF

Info

Publication number
CN115527272A
CN115527272A CN202211253854.1A CN202211253854A CN115527272A CN 115527272 A CN115527272 A CN 115527272A CN 202211253854 A CN202211253854 A CN 202211253854A CN 115527272 A CN115527272 A CN 115527272A
Authority
CN
China
Prior art keywords
pedestrian
spatial
time
matrix
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211253854.1A
Other languages
Chinese (zh)
Inventor
王斌
段安盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN202211253854.1A priority Critical patent/CN115527272A/en
Publication of CN115527272A publication Critical patent/CN115527272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a construction method of a pedestrian trajectory prediction model, and belongs to the field of pedestrian trajectory prediction. A construction method of a pedestrian trajectory prediction model comprises the following steps: constructing a space-time attention diagram convolution network model; inputting time embedding, and carrying out time marking by adding a time position coding vector; inputting space embedding, and marking the position of the pedestrian through a space position coding vector; calculating an attention matrix by adopting an attention mechanism; obtaining a temporal interaction graph representing temporal interactions from the temporal graph input, and obtaining a spatial interaction graph representing spatial interactions from the spatial graph input; aggregating the final time interaction matrix and the final space interaction matrix through a graph convolution network, and learning track representation; and training the model through a data set to obtain a final pedestrian track prediction model.

Description

Construction method of pedestrian trajectory prediction model
Technical Field
The invention relates to the field of pedestrian trajectory prediction, in particular to a construction method of a pedestrian trajectory prediction model.
Background
Predicting pedestrian trajectories requires modeling two key dimensions shown in fig. 1: (1) A time dimension in which we model valid information such as position and speed in a pedestrian's past trajectory to capture a time correlation and then predict the pedestrian's next position; (2) Spatial dimensions, where we construct a spatial directed graph for pedestrians in the same scene at the same time to obtain spatial interaction between the pedestrians. Collision can be avoided by predicting pedestrian trajectories through spatial interaction.
Pedestrian trajectory prediction is a key technology in autopilots, which remains difficult due to the complex interaction between pedestrians and the uncertainty of each pedestrian's future actions. Past work has relied primarily on pedestrian positional relationships to independently model temporal dependencies or spatial interactions, which is not sufficient to represent real-world complications. One of the main challenges of pedestrian trajectory prediction is the modeling coupled with temporal dependencies and spatial interactions. Since the spatial and temporal dynamics of pedestrians are closely dependent on each other. Specifically, the prior art has always used a temporal model to summarize each pedestrian's time-varying features independently to predict the pedestrian's next position, or a spatial model to model each pedestrian's spatial interaction to predict the walking trajectory. These methods are suboptimal because the information of the pedestrian in the temporal and spatial dimensions is not taken into account in a combined manner.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a construction method of a pedestrian trajectory prediction model.
The purpose of the invention can be realized by the following technical scheme:
a construction method of a pedestrian trajectory prediction model comprises the following steps:
constructing a space-time attention diagram convolution network model;
inputting time embedding, and carrying out time marking by adding a time position coding vector; embedding an input space, and marking the position of a pedestrian through a space position coding vector;
calculating an attention matrix by adopting an attention mechanism;
obtaining a temporal interaction graph representing temporal interactions from the temporal graph input, and obtaining a spatial interaction graph representing spatial interactions from the spatial graph input;
aggregating the final time interaction matrix and the final space interaction matrix through a graph convolution network, and learning track representation;
and training the model through the data set to obtain a final pedestrian trajectory prediction model.
Optionally, the data set comprises an ETH data set and a UCY data set.
Optionally, the spatial map represents spatial interaction of all pedestrians in the scene, and the temporal map represents a complete trajectory of each pedestrian.
Optionally, a mechanism of attention is given to the sequential concept in the input/output sequential data by the position encoder layer.
The prediction model is constructed by the construction method of the pedestrian trajectory prediction model.
The invention has the beneficial effects that:
the invention relates to a pedestrian prediction model time graph embedding and space graph embedding combined with a position coding mechanism, which is constructed by the invention, so as to solve the problem that an attention module is insensitive to the position. The GCN is used to couple temporal and spatial features to prevent loss of critical spatio-temporal information.
2 the pedestrian prediction model constructed by the invention provides a new decoder, the number of layers of the decoder is less than that of TCNs convolution layers, and the defects that the traditional RNN is affected by gradient elimination and high calculation cost can be avoided.
3 embodiments of the invention the method of the invention was evaluated on a complete pedestrian data set ETH and UCY. On ETH/UCY, this example verifies that the method constructed by the present invention performed better than the most advanced pedestrian trajectory prediction method in the last 4 years, and achieved significant performance improvement (average displacement error of 10%, final displacement error of 21%). Extensive ablation studies were further conducted in the examples of the present invention to demonstrate the superiority of the STAGCN over various temporal and spatial model combinations.
Drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 illustrates two key dimensions of pedestrian trajectory prediction in the prior art;
FIG. 2 is a convolutional network for spatiotemporal attention of the present application;
FIG. 3 is a visualization of trajectories in the present application for a pedestrian side-by-side walking, encountering and turning scenario;
FIG. 4 is a visualization of the trajectory in the pedestrian turning and stationary scenes of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In an embodiment of the invention, a spatiotemporal attention-graph convolutional network (STAGCN) for pedestrian trajectory prediction is disclosed. As shown in fig. 2, we use an attention mechanism to obtain temporal dependencies and spatial interactions. Before using the attention mechanism, we add position coding in the temporal map embedding and the spatial map embedding to solve the problem that the attention mechanism is insensitive to the input element sequence. Two-layer graph convolution networks can aggregate temporal dependencies and spatial interactions and learn trajectory representations. The final trajectory representation is obtained by adding gaussian noise to the trajectory representation. Given the final trajectory representation, we can use a decoder to predict the parameters of the double-gaussian distribution in the time dimension for future trajectory point prediction.
The embodiment of the invention discloses a construction method of a pedestrian track prediction model, which comprises the following steps of;
the formula of the model is preliminarily constructed, and N pedestrians T epsilon {1, \ 8230;, T ∈ in a scene in a period of time are assumed obs ,…,T pred }. At time step t, the position of pedestrian i is represented by a pair of two-dimensional Cartesian coordinates
Figure BDA0003888780560000041
The pedestrian position i =1,2, \ 8230;, N, at time step T =1, \ 8230;, T obs Interest in the problem of predicting future trajectories from time steps T = T obs +1tot=T pred
Given an input trajectory
Figure BDA0003888780560000042
And
Figure BDA0003888780560000043
where D represents the dimension of the 2D Cartesian coordinates, N represents all pedestrians at time step T, T obs Representing the first 8 time steps. A series of time charts are constructed to represent the complete trajectory of each pedestrian. From time step T =1 to T = T obs Each coordinate being connected to form a time graph G tem (V i ,U i ),
Figure BDA0003888780560000044
Represents G tem A node of, and
Figure BDA0003888780560000045
is the coordinate of the ith pedestrian at time step t
Figure BDA00038887805600000414
Figure BDA0003888780560000048
Represents G tem Wherein
Figure BDA0003888780560000049
Representing nodes
Figure BDA00038887805600000410
If connected, they are represented as 1, otherwise they are represented as 0. Due to time dependence, U i The element in (1) is initialized to 1 (Shi et al, 2021). For attention-driven processing of inputs, they are embedded in a higher D-dimensional space through a simple fully connected layer:
e tem =φ(G tem ,W e tem ) (1)
Where phi (.) represents a linear transformation,
Figure BDA00038887805600000411
is the embedding of the time map(s),
Figure BDA00038887805600000412
is the time map embedding weight.
The spatial map represents the spatial interaction of all pedestrians in the scene. At time step t, the coordinates of all pedestrians are connected to form a spatial map G spa (V t ,U t )。
Figure BDA00038887805600000413
Is G spa Represents all pedestrians at time step t.
Figure BDA0003888780560000051
Is the observed coordinate position
Figure BDA0003888780560000052
Figure BDA0003888780560000053
Figure BDA0003888780560000054
Is G spa The edge set of (1), wherein
Figure BDA0003888780560000055
Representing nodes
Figure BDA0003888780560000056
Figure BDA0003888780560000057
Whether connected (denoted 1) or disconnected (denoted 0). U shape t Initialisation to fill 1 of the upper triangular matrix, i.e. the current state is independent of future statesState (Shi et al, 2021). Embedding into higher D-dimensional space by a simple fully connected layer:
e spa =φ(G spa ,W e spa ) (2)
wherein phi (a.) represents a linear transformation,
Figure BDA0003888780560000058
is a spatial map embedding, and the spatial map embedding,
Figure BDA0003888780560000059
are the embedding weights.
Likewise, the output of the pedestrian i-model of the invention at time t is a D-dimensional final trajectory representation, which is back-projected to Cartesian human coordinates
Figure BDA00038887805600000510
The position encoder layer gives a mechanism of attention to the sequential concept in the input/output sequential data. In the model of the invention, the input is embedded
Figure BDA00038887805600000511
By adding a time-position-coding vector having the same dimension D at time t
Figure BDA00038887805600000512
Time stamping is performed. Embedding
Figure BDA00038887805600000513
By adding a spatial position-coding vector having the same dimension D
Figure BDA00038887805600000514
Position marking is carried out on the pedestrian i:
Figure BDA00038887805600000515
Figure BDA00038887805600000516
in this formula, the position-coded p-vector is defined using the broad spectrum of sine/cosine functions as follows:
Figure BDA00038887805600000517
Figure BDA00038887805600000518
where k represents position and the dimension is denoted by d. From the above formula, this means that for each position k of the p-vector, there is a corresponding sinusoid that spans a frequency range of 2 pi to 1000 · 2 pi. In other words, this would allow the model to look at the order of the sequential data using unique relative positions. Coding a vector p in temporal position tem Where k denotes the position information of the person at time step k in the complete observation trajectory, and the position code takes care of the order in the sequential position information using unique relative positions. Coding a vector p in spatial position spa K represents the location information of the pedestrian k in the space map, and the location code ensures the identity of the pedestrian in the scene using a unique relative location.
To extract temporal dependencies and spatial interactions, an attention mechanism is first employed to compute a temporal attention matrix
Figure BDA0003888780560000061
And space attention moment array
Figure BDA0003888780560000062
The formula is as follows:
Figure BDA0003888780560000063
Figure BDA0003888780560000064
Figure BDA0003888780560000065
Figure BDA0003888780560000066
Figure BDA0003888780560000067
Figure BDA0003888780560000068
wherein
Figure BDA0003888780560000069
And
Figure BDA00038887805600000610
respectively series and keys of the self-attention mechanism.
Figure BDA00038887805600000611
Figure BDA00038887805600000612
Is the weight of the linear transformation. d tem And d spa Is the dimension of each query.
Figure BDA00038887805600000613
And
Figure BDA00038887805600000614
is to implement a scaled dot product term for numerical stability. GPU acceleration may be used to compute the temporal dependencies and spatial interactions of all pedestrians in parallel.
Attention matrix Att of time of each pedestrian tem Are superimposed into
Figure BDA00038887805600000615
And the step of time is from T =1 to T = T obs Spatial attention matrix Att of spa Are superimposed into
Figure BDA00038887805600000616
Due to the variation in the number of pedestrians in different scenes, spatial interaction cannot be learned by convolution along spatial channels. Direct interaction matrix of time
Figure BDA00038887805600000617
Viewing as a time interaction matrix
Figure BDA00038887805600000618
These stacked spatial attention matrices are then fused with a1 × 1 convolution along the temporal path to obtain a spatial interaction matrix
Figure BDA00038887805600000619
Figure BDA0003888780560000071
By passing
Figure BDA0003888780560000072
And
Figure BDA0003888780560000073
using a hyper-parameter alpha epsilon [0,1 ]]Generating a temporal mask M by element thresholding tem And a spatial mask M spa
Figure BDA0003888780560000074
Figure BDA0003888780560000075
Wherein
Figure BDA0003888780560000076
Is an indication function, if the above formula satisfies the condition, 1 is output, otherwise 0.σ is the Sigmoid activation function. Adding two identity matrices D to a temporal mask M, respectively tem And spatial mask M spa To ensure that the nodes are self-connecting. Then, the time interaction matrix is
Figure BDA0003888780560000077
Multiplying the corresponding elements of the two matrices between the time masks added from the connection matrix to obtain the final time interaction matrix F tem . The final spatial interaction matrix F is obtained in the same way spa
Figure BDA0003888780560000078
Figure BDA0003888780560000079
Wherein |, indicates element multiplication. Thus, a time interaction graph G representing time interactions is finally obtained from the time graph input tem (V i ,F tem ). Finally, a space interaction graph G representing space interaction is obtained from the space graphic input spa (V t ,F spa )。
The Graph Convolution Network (GCN) takes as input a feature matrix representing the attributes of each node and effectively aggregates features in the neighborhood defined by the adjacency matrix. Static binary adjacency matrices are typically used for the training of the GCN. On the other hand, the entries in the adjacency matrix may be continuous real-time functions, allowing for adaptive and dynamic aggregation of neighbor information. When the GCN is used to encode the states, the interaction between the nodes can be easily modulated by changing the adjacency matrix. Polymerization of the Final time interaction matrix F by GCNs tem And the final spatial interaction matrix F spa And learn the trajectory representation. Two GCNs are used to learn the trajectory representation, in one branch, F tem At F spa Previously fed to the network and in the other branch they are fed in reverse order. Thus, the first branch generates a time trace representation
Figure BDA00038887805600000710
While the other branch produces a spatial trajectory representation
Figure BDA00038887805600000711
Track representation
Figure BDA00038887805600000712
Is the last GCNs output I temporal And I spatial The sum of (a) and (b).
I temporal =δ(δ(F tem ×Traj tem )×F spa ) (17)
I spatial =δ(δ(F spa ×Traj spa )×F tem ) (18)
I=I temporal +I spatial (19)
Where δ is the pre-activation function.
In addition, the real motion mode of the pedestrian in the changing environment is simulated from a large amount of real pedestrian trajectory data. Due to the uncertainty of pedestrian motion, it is expected that the model of the present invention will be able to create multiple reasonable and realistic trajectories. Yu et al propose various penalties to encourage the network to make different future trajectories and their method has proven effective (Yu et al, 2020). The multimodal nature of pedestrian motion is modeled following their approach (Mohamed et al, 2020). The input to the decoder is the final track representation I final It consists of two parts: trajectory representation I and noise addition
Figure BDA0003888780560000081
(as shown in fig. 2). The random Gaussian noise can enable the model to be well regularized, and the robustness of the model is improved. Final trajectory representation
Figure BDA0003888780560000082
Figure BDA0003888780560000083
The formula is as follows:
Figure BDA0003888780560000084
wherein
Figure BDA0003888780560000085
Is a series operation. Z is random gaussian noise.
Assuming future trajectories
Figure BDA0003888780560000086
A bivariate gaussian distribution will be followed,
Figure BDA0003888780560000087
further, the predicted trajectory is defined as
Figure BDA0003888780560000088
It follows a calculated bivariate distribution
Figure BDA0003888780560000089
Given a final trajectory representation I final The decoder can be predicted to have a double gaussian distribution parameter in the time dimension. The decoder includes a convolutional layer and a simple fully-connected layer. The decoder is designed because it does not suffer from gradient extinction and high computational cost as does a conventional RNN. The decoder size of the present invention is smaller compared to TCN. The model of the present invention is trained to minimize negative log-likelihood, defined as follows:
Figure BDA00038887805600000810
in embodiments of the present invention, ETH and UCY data sets are used for training and testing. The ETH and hotel scenarios are contained in the ETH dataset, while the UCY dataset has three different scenarios: UNIV, ZARA1 and ZARA2. The 2D position of the frame number, pedestrian number, and trajectory coordinate is a data attribute. Four data sets are trained using a leave-one-out method and tested in the remaining data sets. Similar to social LSTM (Alahi et al, 2016), a trace of 8 time steps (3.2 seconds) is also entered and the next 12 time steps are predicted.
The standard indices are the Average Displacement Error (ADE) and the Final Displacement Error (FDE) (Choi and Dariush,2019, mangali et al, chai et al, 2020; kothari et al, 2021). In short, the ADE measurement method predicts the average L-2 distance between a track point and all ground truth future track points, while the FDE measurement method ultimately predicts the L-2 distance between the destination and the final destination of the ground truth future track points.
Figure BDA0003888780560000091
Figure BDA0003888780560000092
Specifically, the coordinate dimension is 2, and the graph embedding dimension and the attention embedding dimension are 64. The number of self-attentive layers is 1. The convolutional network of the spatial encoder consists of 7 convolutional layers, with a kernel size of s =3. The structure of the graph convolution network used in the model is similar to social STGCNN (Mohamed et al, 2020). Spatial GCN and temporal GCN cascade 1 layer. The threshold α is empirically set to 0.5. The dimension of the random gaussian noise Z is set at 32 using prilu as the nonlinear activation δ (·). The input dimension of the decoder is 64 and the output dimension of the decoder is 5. The proposed method uses Adam optimizer training 300 data batches, with a data batch size of 128. The initial learning rate was set to 0.01, with a decay factor of 0.1 at intervals of 50 data periods. The model of the invention was trained on GeForce RTX 3090 for 300 epochs using an Adam optimizer and a learning rate of 0.01. The method of the invention is implemented on PyTorch.
In this example, an existing method is selected as the baseline.
SGAN: a pedestrian trajectory prediction method combines sequence prediction with generation of an anti-net tool. (Gupta et al, 2018).
Sophie: GAN-based prediction methods consider both context information of the scene and path history of all agents using an attention mechanism (Sadeghian et al, 2019).
Social BiGAT: a method of simulating social interaction of pedestrians in a scene using a loop-countermeasures structure and introducing a graphical attention network (Kosaraju et al, 2019).
RSBG: a new insight is presented for a group-based social interaction model in conjunction with GCN for predicting pedestrian trajectories (Sun et al, 2020).
PITF: the PITF proposes an end-to-end multitask learning system that utilizes rich visual features regarding human behavioral information and its interaction with the surrounding environment (Liang et al, 2020).
Social STGCNN: a method of modeling spatial interaction as a graph and using a time-extrapolated convolutional neural network to predict future further steps (Mohamed et al, 2020).
STAR: a new space-time transformer and time transformer were introduced to capture the spatio-temporal interactions between pedestrians (Yu et al, 2020).
NMMP: a neuromotor messaging for interactive modeling is presented that can predict future trajectories in a variety of scenarios (Hu et al, 2020).
Carre mendiita and Tabkhi propose a convolution method for real-time pedestrian path prediction using various graph isomorphic networks, combined with flexible convolutional neural network design (mendiida and Tabkkhi, 2021).
AVGCN: a new method for trajectory prediction using a Graph Convolution Network (GCN) based on human attention (Liu et al, 2021).
The embodiments of the present invention disclose the results of pedestrian trajectory data sets, which are the most widely used benchmarks for trajectory prediction tasks, and compare the results with other most advanced methods. The results are shown in Table 1 and evaluated using ADE and FDE indices. Compared with the existing method, the prediction model constructed by the invention is superior to ETH and UCY data sets. Especially for FDE metrics, the prediction model constructed by the present invention is 20% better on ETH and UCY datasets than the previous best method NMMP. For ADE measurements, the mean values of the prediction models constructed by the present invention over the ETH and UCY datasets exceeded the previous best method NMMP 9%.
It can be seen that the model of learning spatial interactions using graphical representations is superior to other methods, such as social BiGAT, social STGCNN, and STAR-D, over the UNIV sequence, which mainly contains dense crowd scenes. Interestingly, the method of the present invention is superior to all the methods described above. STAR-D is the first model to propose coupled temporal dependencies and spatial interactions, which is superior to independently modeling spatial interactions, such as society BiGAT and society STGCNN. The approach of coupling temporal dependence with spatial interaction by GCN is advantageous to achieve better performance than STAR-D. The results show that the prediction model constructed by the present invention couples temporal dependence and spatial interaction more successfully than STAR-D.
TABLE 1 comparison of quantitative methods with baseline methods
Figure BDA0003888780560000121
In addition, the results of the present invention were further evaluated using Sophie's percent near impact (if the distance between two pedestrians is less than 0.1m, two people are judged to collide with each other). Table 2 shows the average percentage of pedestrian short range collisions in each UCY/ETH scene across all frames. With this new assessment method (Sadeghian et al, 2019), the prediction model of the present invention is far superior to other methods, which means that it can design more socially and physically acceptable trajectories for each pedestrian.
TABLE 2 average percentage of human collisions in each scene
Figure BDA0003888780560000122
Figure BDA0003888780560000131
As shown in table 3, three different variants were evaluated, wherein: (1) STAGCN w/o PE indicates that position coding is deleted in the method of the invention; (2) STAGCN w/o SD shows that the spatial encoder is removed in the method of the invention, where it models only the time dependency; (3) STAGCN w/o TD indicates that the spatial encoder is removed in the method of the invention, where it models only spatial interactions; (4) STAGCN w/o Z shows that the method removes random Gaussian noise; (5) STAGCN is the result of the complete model. From the results, it can be seen that the removal of the temporal and spatial encoders from the model results in a significant performance degradation. In particular, the results of STAGCN w/o SD show that the performance degradation of ADE and FDE is 73% and 72%, respectively, which verifies the contribution of the time encoder to the final performance of the pedestrian trajectory prediction. Furthermore, the results of STAGCN w/o TD show a 62% and 61% decrease in ADE and FDE performance, respectively, indicating that the spatial encoder is also important for pedestrian trajectory prediction. The results of STAGCN w/o PE show that the performances of ADE and FDE are respectively reduced by 3% and 9%, which shows that position coding can eliminate the variable limits such as replacement and the like and improve the performance of the model. The results of STAGCN w/o Z show that the performance of ADE and FDE is reduced by 8% and 14% respectively, which shows that random Gaussian noise can enable the model to be regularized better and improve the performance of the model.
TABLE 3 ablation study
Figure BDA0003888780560000141
Several common interaction scenarios are visualized in fig. 3, where the point at the end of each trajectory represents the start. The method of the present invention is compared to the socialized STGCNN because it learns spatial interactions using graphical representations and learns the parameterized distribution of future trajectories.
The parallel, encounter, stop, turn and mix situations in fig. 3 and 4 were chosen. In the scenes that pedestrians walk side by side, encounter and turn, the track distribution predicted by the two models is visualized. The different colored regions represent the future trajectory profiles of different pedestrians. The yellow line represents the pedestrian's historical track (8 frames) and the red line represents the ground truth (12 frames). The blue line represents the predicted trajectory of the STAGCN (12 frames). The green line represents the predicted trajectory of social STGCNN (12 frames). The visualization shows that the predicted trajectory (12 frames) is closer to the ground truth (12 frames), while the social STGCNN predicted trajectory is much different from the ground truth, which means that the prediction model of the present invention is more accurate.
Scenes (a) and (e) show that two people walk side by side, the social STGCNN has deviation problem, and the predicted track of the invention is consistent with the ground true phase track. Scenario (b) is a complex and crowded scenario, social STGCNN has overlapping problems, while the model of the present invention still has good performance. Scenario (c) shows a person moving in the direction of two people walking side-by-side, the social STGCNN will experience trajectory bias, and the predicted trajectory will shift to the right to avoid collision. Scene (d) shows that the model of the invention can well handle the situations of pedestrian turning and social STGCNN trajectory deviation. In the scene (f), one pedestrian passes through a group of people standing still at the bottom, the predicted track has almost no deviation, and the fact that the static pedestrian is not influenced by other pedestrians is shown when the model captures the scene (f).
After turning in fig. 4 (a), two pedestrians are driving toward each other, the Social STGCNN has a serious deviation, and the predicted trajectory deviation is small. Scenario (b) shows the process of two people slowing down and stopping in fig. 4 (b), and the Social STGCNN has a bad track error, which the model of the present invention can handle well. The society STGCNN is not sensitive to predicting the trajectory of a pedestrian staying for a period of time in fig. 4 (c), but the model of the present invention can accurately distinguish the static state of the pedestrian and give a well-predicted trajectory.
In summary, the society STGCNN predicts overlapping trajectories and deviates from ground truth, while predicted trajectories follow better paths along the ground truth. For parallel, encounter, stop, turn, and mixed cases, social STGCNN predicts off-track because it focuses on spatial interactions independently of time. In contrast, STAGCN combines spatial interaction and temporal dependence to predict trajectories and results in better predicted trajectories.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (6)

1. A construction method of a pedestrian trajectory prediction model is characterized by comprising the following steps:
constructing a space-time attention diagram convolution network model;
inputting time embedding, and carrying out time marking by adding a time position coding vector; inputting space embedding, and marking the position of the pedestrian through a space position coding vector;
calculating an attention matrix by adopting an attention mechanism;
obtaining a temporal interaction graph representing temporal interactions from the temporal graph input, and obtaining a spatial interaction graph representing spatial interactions from the spatial graph input;
aggregating the final time interaction matrix and the final space interaction matrix through a graph convolution network, and learning track representation;
and training the model through a data set to obtain a final pedestrian track prediction model.
2. The method of constructing a pedestrian trajectory prediction model according to claim 1, wherein the data set includes an ETH data set and a UCY data set.
3. The method of constructing a pedestrian trajectory prediction model according to claim 1, wherein the spatial map represents spatial interactions of all pedestrians in the scene, and the temporal map represents a complete trajectory of each pedestrian.
4. The method of claim 1, wherein the step of calculating the attention matrix using an attention mechanism comprises the steps of:
fusing the spatial attention matrix with the 1 × 1 convolution along the time channel to obtain a spatial interaction matrix;
multiplying the time interaction matrix by corresponding elements of the two matrices between the time masks added from the connection matrix, thereby obtaining a final time interaction matrix; the final spatial interaction matrix is obtained in the same way.
5. A prediction model constructed by the method of constructing a pedestrian trajectory prediction model according to any one of claims 1 to 4.
6. Use of the predictive model of claim 5 in an autonomous driving system.
CN202211253854.1A 2022-10-13 2022-10-13 Construction method of pedestrian trajectory prediction model Pending CN115527272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211253854.1A CN115527272A (en) 2022-10-13 2022-10-13 Construction method of pedestrian trajectory prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211253854.1A CN115527272A (en) 2022-10-13 2022-10-13 Construction method of pedestrian trajectory prediction model

Publications (1)

Publication Number Publication Date
CN115527272A true CN115527272A (en) 2022-12-27

Family

ID=84701141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211253854.1A Pending CN115527272A (en) 2022-10-13 2022-10-13 Construction method of pedestrian trajectory prediction model

Country Status (1)

Country Link
CN (1) CN115527272A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115829171A (en) * 2023-02-24 2023-03-21 山东科技大学 Pedestrian trajectory prediction method combining space information and social interaction characteristics

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115829171A (en) * 2023-02-24 2023-03-21 山东科技大学 Pedestrian trajectory prediction method combining space information and social interaction characteristics

Similar Documents

Publication Publication Date Title
Wang et al. Stepwise goal-driven networks for trajectory prediction
Zhu et al. A survey of deep RL and IL for autonomous driving policy learning
Huang et al. Multi-modal motion prediction with transformer-based neural network for autonomous driving
Ivanovic et al. Generative modeling of multimodal multi-human behavior
Rempe et al. Trace and pace: Controllable pedestrian animation via guided trajectory diffusion
Mo et al. Research of biogeography particle swarm optimization for robot path planning
Zhang et al. Artificial intelligence and its applications
Schmidt et al. Crat-pred: Vehicle trajectory prediction with crystal graph convolutional neural networks and multi-head self-attention
Nasir et al. A genetic fuzzy system to model pedestrian walking path in a built environment
Asgharivaskasi et al. Semantic OcTree mapping and Shannon mutual information computation for robot exploration
Kala et al. Dynamic environment robot path planning using hierarchical evolutionary algorithms
Kim et al. Topological semantic graph memory for image-goal navigation
Bing et al. Complex robotic manipulation via graph-based hindsight goal generation
Kong et al. RNN-based default logic for route planning in urban environments
CN115527272A (en) Construction method of pedestrian trajectory prediction model
Bonatti et al. Pact: Perception-action causal transformer for autoregressive robotics pre-training
Taghizadeh et al. A novel graphical approach to automatic abstraction in reinforcement learning
Chandra et al. Using graph-theoretic machine learning to predict human driver behavior
Viceconte et al. Adherent: Learning human-like trajectory generators for whole-body control of humanoid robots
Zhou et al. CSR: cascade conditional variational auto encoder with socially-aware regression for pedestrian trajectory prediction
Kala et al. Robotic path planning using evolutionary momentum-based exploration
CN112634328B (en) Method for predicting pedestrian track based on self-centering star chart and attention mechanism
Dey Applied Genetic Algorithm and Its Variants: Case Studies and New Developments
Chen et al. HGCN-GJS: Hierarchical graph convolutional network with groupwise joint sampling for trajectory prediction
Yang et al. RMRL: Robot Navigation in Crowd Environments with Risk Map-based Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination