CN113989326B

CN113989326B - Attention mechanism-based target track prediction method

Info

Publication number: CN113989326B
Application number: CN202111240446.8A
Authority: CN
Inventors: 罗光春; 张栗粽; 康昭; 段贵多; 刘欣; 冯科
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2023-08-25
Anticipated expiration: 2041-10-25
Also published as: CN113989326A

Abstract

The invention discloses a track prediction method based on an attention mechanism, and belongs to the technical field of computer vision. Firstly, extracting a position sequence of a target; and then, each target utilizes a long-period memory network code to acquire target track feature expression, then adopts a graph attention network to fuse interaction features among target tracks, obtains time sequence features among all history moments of the target tracks based on an attention mechanism, and finally takes the target track features fused with the interaction features and the time sequence features as input of the long-period memory network, decodes and calculates to obtain the predicted position of the target. The method is based on reasoning the target relation and introducing time sequence characteristics through an attention mechanism to improve the accuracy of track prediction.

Description

Attention mechanism-based target track prediction method

Technical Field

The invention relates to the technical field of computer vision, in particular to a target track prediction method based on an attention mechanism.

Background

Along with the development of scientific technology, various positioning devices are continuously emerging, so that the difficulty of acquiring object track data is greatly reduced, the number and variety of acquired track data are rapidly increased, and a lot of track data have great research value. The acquired track data is stored and analyzed, and the method plays an important role in the aspects of target behavior identification, traffic planning, urban safety, prevention and control and the like.

The main task of track prediction is to predict future track points of a target according to historical track data of the target, and the prediction of how the track of the target is developed and the position of the target at a specific moment are of great significance for researching the behavior mode and detailed information of the excavated target. The understanding and reasoning ability of track prediction can promote analysis and prediction research of human behavior and action, further expand success of artificial intelligence in the fields of vision and voice, and promote the acquisition of higher-level cognition, analysis and reasoning ability. Therefore, the effective analysis of the target track is a key link for realizing the landing of artificial intelligence to human life.

The key challenges of trajectory prediction are the complexity of predicting target behavior and the diversity of external factors. The athletic performance is affected by the own target intent, the existence and behavior of surrounding targets, the association between targets, social rule constraints, and other factors. The method can be divided into two types according to different using technologies in the track prediction method, wherein one type is based on a traditional mathematical statistical model for modeling prediction, and the other type is based on a neural network model for modeling prediction. The traditional mathematical statistical model-based method has better processing effect when the input track data is linearized. The method based on the neural network model is more suitable for processing nonlinear data, but has higher requirement on input data information of a network.

Because of the advantages of recurrent neural networks in terms of processing timing information, deep learning-based trajectory methods basically use the network to extract sequence features in the network structure. Meanwhile, the track prediction task can input an image to assist in improving the prediction effect besides taking the target track sequence as a model input. In addition, there are methods to enhance the prediction effect by using the characteristics of the spatio-temporal information. However, it is difficult to capture the association between targets by the method for a single feature, and the problem of error increase due to prediction at a long time interval in the prediction process is still difficult to solve.

Disclosure of Invention

In order to solve the problem that the existing track prediction method is difficult to effectively process target interaction modeling and the prediction error is continuously increased, the invention provides a track prediction method based on an attention mechanism.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a target track prediction method based on an attention mechanism comprises the following steps:

step 1, acquiring historical tracks of all targets in a video data set to obtain a historical track data set of all targets;

step 2, extracting a first hidden state of each target at a historical moment t by utilizing a long-short-term memory network LSTM based on a historical track data set of the targets;

the method comprises the following specific steps: calculating the historical relative position of the target i at each historical moment and the last historical moment, and calculating the position vector of the target i at the historical moment t by embedding a function in the historical relative positionAnd then->Inputting the first hidden state of the target i output by the long-short-term memory network LSTM at the history time t into the long-short-term memory network LSTMWherein i is E [1, N]；

Step 3, establishing an initial association relation between targets, and setting a first hidden state of each targetAs input to a corresponding node in the graph attention network; then fusing the track interaction characteristics of the target i and the adjacent targets through the graph annotation force network; fusing each history time to the target i based on the attention mechanism to obtain the history time T of the target i _obs Is a joint hidden state of (a);

the method comprises the following specific steps:

3-1, establishing an initial association relation between targets;

3-2, for the object i, fusing the track interaction characteristics of the adjacent object based on the graph attention network to obtain a second hidden state

3-3. Second hidden state of object i at each historic momentInputting another diagram attention network, introducing attention mechanism, calculating second hidden state at last historical time T _obs Correlation with each previous history time tRe-calculating the target i at the last historical time T _obs Joint hidden state of long term memory network LSTM output

Step 4, the target i is at the historical moment T _obs Is a joint hidden state of (a)Inputting the long-short-term memory network D-LSTM to decode to obtain the target i at the first predicted time T _obs+1 Is a predicted relative position of (2); and then taking the predicted relative position as the historical relative position of the next predicted time, returning to the step 2, calculating the updated predicted relative position of the next predicted time, sequentially iterating to obtain the predicted relative positions of all the subsequent predicted times, and finally obtaining the track prediction result of the target i.

The target track prediction method based on the attention mechanism solves the problems that the existing track prediction method is difficult to effectively analyze the association relation between targets and the prediction accuracy is low. The core idea is to extract the position sequence of the target; and then, each target is encoded by utilizing a long-term and short-term memory network to acquire target track characteristic expression, and then, interaction characteristics among target tracks are fused by adopting a graph attention network, and time sequence characteristics among historical moments of the target tracks are acquired based on an attention mechanism. And finally, taking the target track characteristics fused with the interaction characteristics and the time sequence characteristics as the input of the long-short-period memory network, decoding and calculating to obtain the predicted position of the target. The invention is based on reasoning the target relation and introducing time sequence characteristics through an attention mechanism to improve the accuracy of track prediction.

Compared with the prior art, the method applies the attention mechanism method in natural language processing to target track prediction, relieves the phenomenon that the prediction result error increases along with the prediction interval in the track prediction method, and ensures that the result and model calculation are more persuasive from the angle of input features.

Drawings

Fig. 1 is a general flow chart of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 1, the track prediction method based on the attention mechanism in the present invention includes the following steps:

step 2, extracting hidden states of all targets at a historical moment t by utilizing a long-short-term memory network LSTM based on a historical track data set of the targets; the method comprises the following specific steps:

2-1 assuming a total of N targets, the position coordinates of the target i at the historical time t are At the predicted time t _q Is +.>

2-2. Calculating the historical relative position of the target i at each historical time and the last historical time

2-3, calculating the position vector of the target i at the history time t by embedding the history relative position into the function FAnd then->Inputting into a long-short-term memory network LSTM to obtain a first hidden state of the output of the long-short-term memory network LSTM of the target i at the historical moment t>

Wherein, the liquid crystal display device comprises a liquid crystal display device,f is an embedded function, W _l Is the weight of the long and short term memory network element.

Step 3, establishing an initial association relation between targets, and setting a first hidden state of each targetAs input to a corresponding node in the graph attention network; then fusing the track interaction characteristics of the target i and the adjacent targets through the graph annotation force network; then fusing the attention correlation of each history time to the target i based on the attention mechanism to obtain the target i at the history time T _obs Is a joint hidden state of (a); the method comprises the following specific steps:

3-1, establishing an initial association relation between targets: and (3) mutually associating targets in a default setting scene, and establishing a relation diagram of full connection between the targets based on the number of the targets.

3-2, fusing the track interaction characteristics of adjacent targets based on the graph attention network for the target i.

3-2-1. First hidden state of each objectAs input of corresponding nodes in the graph attention network, different target node pairs { (i, j) |j epsilon N at each historical moment t are calculated _i Attention coefficient between }:

wherein, the connection operation is represented by the I,representing the attention coefficient of object j to object i at historic time t, N _i Adjacent object set representing object i in the relation diagram, j representing the sequence number of the adjacent object of object i,/>Representing a long short-term memory network LS of a target j at a history time tFirst hidden state of TM output,>any adjacent target k (k e N) representing target i _i ) The first hidden state of the long-term memory network LSTM output at the historic moment t, W, a, is a learnable variable, without specific meaning.

3-2-2. Calculate each target node pair { (i, j) |j ε N _i After the attention coefficient at the historical moment t, calculating a second hidden state output by the long-short-term memory network LSTM after the track interaction characteristics of the target i and the adjacent target are fused at the historical moment t through the graph attention network

Where σ represents a nonlinear function.

3-3. Second hidden state of object i at each historic momentInputting another graph into the attention network, i.e. introducing the attention mechanism, calculating at the historic moment T _obs Second hidden state->With each history time T (t.epsilon. {1, …, T) _obs-1 Second hidden state>Correlation between->

Wherein the method comprises the steps of<.,.>Is the inner product operator of the method,for calculating process intermediate variables, there is no specific meaning, < ->Indicating that target i is at history time T _obs And a second hidden state after the track interaction characteristics of the adjacent targets are fused.

Calculating the target i at the historical moment T _obs Joint hidden state of long term memory network LSTM output

Step 4, the target i is at the historical moment T _obs Is a joint hidden state of (a)And inputting the long-term and short-term memory network D-LSTM to decode to obtain the track prediction result of the target. The method comprises the following specific steps:

4-1 target i at historic time T _obs Is a joint hidden state of (a)Adding noise vector, inputting into long-short-term memory network D-LSTM for decoding to obtain target i at predicted time T _obs+1 Is a predicted relative position of (a)The specific calculation process is as follows：

Wherein Z represents the noise vector and wherein,representing the target i at the historical moment T after the noise vector is fused _obs Intermediate state as initial input to the D-LSTM network +.>Indicating that target i is at predicted time T _obs+1 Hidden state, delta, output via D-LSTM network ₃ (-) represents a linear layer, ">Indicating that target i is at history time T _obs Position vector, W of (2) _D Is a learnable training parameter.

4-2 at the predicted time T at the acquisition target i _obs+1 After the predicted relative position of (2), the predicted relative position is taken as the historical relative position of the next predicted time, namely, the predicted time T _obs+2 The historical time is updated to {1,2, …, T _obs ,T _obs+1 And (2) returning to the step (2), and calculating the predicted relative position of the next predicted time, and sequentially iterating to obtain the predicted relative positions of all the subsequent predicted times.

And 4-3, obtaining a track prediction result of the target i according to the predicted relative position of the target i at each prediction moment.

According to the method, firstly, historical track characteristics of targets are extracted through a long-short-term memory network, then, association relations among the targets are set, the track characteristics of the targets are fused through a graph attention network, target interaction is simulated, track characteristics at different moments are fused for each target through an attention mechanism, finally, track characteristics of the targets used for prediction are obtained, and finally, a final track prediction result is obtained through decoding of the long-short-term memory network.

The invention applies the attention mechanism method in natural language processing to the target track prediction, relieves the phenomenon that the prediction result error in the track prediction method increases along with the prediction interval, and greatly reduces the labor and time costs, thereby ensuring that the result is more accurate and effective. The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention.

Claims

1. The target track prediction method based on the attention mechanism is characterized by comprising the following steps of:

the method comprises the following specific steps: calculating the historical relative position of the target i at each historical moment and the last historical moment, and calculating the position vector of the target i at the historical moment t by embedding a function in the historical relative positionAnd then->Inputting into a long-short-term memory network LSTM to obtain a first output of the long-short-term memory network LSTM of the target i at the historical moment tHidden state->Wherein i is E [1, N]；

Step 3, establishing an initial association relation between targets, and setting a first hidden state of each targetAs input to a corresponding node in the graph attention network; then fusing the track interaction characteristics of the target i and the adjacent targets through the graph annotation force network; fusing each history time to the target i based on the attention mechanism to obtain the last history time T of the target i _obs Is (are) associated hidden status->

Specifically, the method comprises the following steps:

3-1, establishing an initial association relation between targets;

Step 4, the target i is at the last history time T _obs Is a joint hidden state of (a)Inputting the long-short-term memory network D-LSTM to decode to obtain the target i at the first predicted time T _obs+1 Is a predicted relative position of (2); and then taking the predicted relative position as the historical relative position of the next predicted time, returning to the step 2, calculating the updated predicted relative position of the next predicted time, sequentially iterating to obtain the predicted relative positions of all the subsequent predicted times, and finally obtaining the track prediction result of the target i.