CN115048870A

CN115048870A - Target track identification method based on residual error network and attention mechanism

Info

Publication number: CN115048870A
Application number: CN202210775960.XA
Authority: CN
Inventors: 高伟超; 燕雪峰
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-07-02
Filing date: 2022-07-02
Publication date: 2022-09-13

Abstract

The invention discloses a target trajectory identification method based on a residual error network and an attention mechanism, and belongs to the field of target behavior identification. The specific implementation steps are as follows: (1) acquiring target track data, preprocessing the target track data, and constructing a target track data set; (2) constructing a target track recognition model, wherein the target track recognition model comprises a time sequence feature extraction module based on a self-attention mechanism and a deep layer feature extraction module based on a residual error network and a channel attention mechanism; (3) performing multiple rounds of iterative training on the target track recognition model constructed in the step (2) by using the target track data set constructed in the step (1); (4) and (4) identifying the behavior mode of the target motion by using the target track identification model trained in the step (3). The invention fully utilizes the advantages of the residual error network and the attention mechanism and solves the problem of insufficient target track identification precision in a noise environment.

Description

Target track identification method based on residual error network and attention mechanism

Technical Field

The invention provides a target trajectory identification method based on a residual error network and an attention mechanism, and belongs to the field of target behavior identification.

Background

With the development of detection technologies such as radar, infrared and satellite, more and more track data are acquired through various detection devices, especially marine navigation tracks and aerial flight tracks, and how to effectively identify a track mode of a target from the track data, so that a commander is assisted to judge the behavior intention of the target, and the problem to be solved is solved.

Although the traditional target trajectory recognition algorithms such as the BP neural network, the hidden markov model, the dynamic time series warping (dwt) algorithm and various clustering algorithms have good recognition accuracy in some cases, a large amount of data processing work is required, and meanwhile, the problem of difficulty in selecting a feature space exists. In addition, in the case of noise, the recognition effect of these algorithms is very poor, and the real-time requirement for recognizing the track pattern in a complex environment cannot be met.

Deep learning has powerful feature expression capability and has also been applied to target trajectory recognition. However, the existing deep learning identification algorithm based on the convolutional neural network and the cyclic neural network also has the problems of poor generalization performance and low overall identification precision. Therefore, how to improve the generalization performance and the recognition accuracy of the existing network model becomes a great challenge at present.

Disclosure of Invention

[ object of the invention ]: in the traditional target track recognition algorithm, proper features need to be artificially selected from a huge feature space and a great deal of preprocessing work needs to be done, which is time and labor consuming, and meanwhile, the stability of the algorithms is poor, and the recognition effect is extremely undesirable in some noise environments. Although the existing deep learning method based on the convolutional neural network and the cyclic neural network can extract a part of effective features, can avoid a large amount of data preprocessing work, and also has certain recognition effect and stability, the overall recognition effect is not ideal due to the fact that the model building is too simple, namely the problem of poor generalization exists. Aiming at the problems, the method for identifying the target track based on the residual error network and the attention mechanism is provided, and the method is integrated into a residual error structure and the attention mechanism into a CNN network and a BilSTM network, so that the whole network model can obtain good generalization capability in the training process, and the accuracy of identifying the target track is effectively improved.

[ technical solution ]:

according to the analysis, the invention aims to disclose a target track identification method based on a residual error network and an attention mechanism, and solve the problem of identification of a target track behavior mode in a complex scene. The invention adopts the following technical scheme for realizing the aim of the invention and solving the technical problem:

(1) acquiring target track data, preprocessing, and constructing a target track data set:

original target track data in a detection area are obtained from detection equipment such as radar, infrared and satellite, and the original target track data comprise time, longitude, latitude, speed, direction and other attribute characteristics, redundant track points with short time intervals are removed from the data, track points with long-term zero speed or abnormal longitude and latitude are screened, and the following maximum-minimum normalization operation (excluding time attributes) is carried out on the remaining track point data:

wherein X _min And X _max Representing the minimum value and the maximum value of the corresponding attribute characteristics to finally obtain tensor data { X) of the target track ₁ ,X ₂ ,…X _n N is the total number of track points of one track. Of these tensor data, 70% are sorted out as training set, 10% as validation set, and the remaining 20% as test set.

(2) Constructing a target track recognition model:

constructing a target track identification model comprising a time sequence feature extraction module based on a self-attention mechanism and a deep layer feature extraction module based on a residual error network and a channel attention mechanism, wherein:

the time sequence feature extraction module comprises two serial BilSt networks and a self-attention mechanism network;

the deep feature extraction module comprises a serial residual error network, a convolution network and two channel attention mechanism networks;

the output of the first BilTM network is connected with the input end of the self-attention mechanism network, and the input of the second BilTM network is connected with the output end of the self-attention mechanism network; the output of the residual network is connected with the input end of the convolution network, the first channel attention mechanism network is connected with the first layer and the second layer of the convolution network, and the second channel attention mechanism network is connected with the second layer and the third layer of the convolution network.

(3) Performing multiple rounds of iterative training on the target track recognition model constructed in the step (2) by using the target track data set constructed in the step (1):

(3.1) setting the iteration number to be t (t > -50), initializing the learning rate lr to be 0.01, and initializing parameters of the network model;

(3.2) using the training data set train as the input of the time sequence feature extraction module and the deep layer feature extraction module to obtain the time sequence feature M of the target track data ₁ And deep abstract features M ₂ ；

(3.3) characterizing the timing of the target trajectory data by M ₁ And deep abstract features M ₂ Performing fusion to obtain fusion characteristics M;

(3.4) inputting the fusion features M into a classifier to obtain a prediction label y of each target classification in a training sample data set train _i

(3.5) use of the Cross entropy loss function

Computing a predictive label y _i With a real label t _i The size of the loss;

(3.6) updating network model parameters by adopting a random gradient descent method SDG to obtain a trained network model H _t ；

(3.7) inputting the validation dataset into the network H _t In, judge the recognition accuracy acc>If 95%, the network model H is saved _t Otherwise, discarding;

and (3.9) repeating the steps from (3.2) to (3.7) to obtain the final trained network model H.

(4) And (4) identifying a behavior mode of the target motion by using the target track identification model trained in the step (3):

and inputting the test sample data set test into a trained target track recognition model H for forward reasoning, and outputting track modes (four track behavior modes such as winding, eight winding, straight line and arc) corresponding to all targets in the test data set.

Further, the timing characteristic extraction module in the step (2) specifically comprises the following components:

the time sequence feature extraction module comprises two serial BilStm networks and a self-attention mechanism network;

in the BilSTM network, two layers of BilSTM unit structures are stacked, the number of hidden units is set to be 128, and the network updating process is represented as follows:

f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f ) (2)

i _t ＝σ(W _i ·[h _t-1, x _t ]+b _i ) (3)

O _t ＝σ(W _o ·[h _t-1, x _t ]+b _o ) (6)

h _t ＝Ο _t ⊙tanh(C _t ) (7)

wherein, C _t-1 、h _t-1 Representing the state and output of the cell at the previous moment, W _f 、W _i 、W _c 、W _O Represents a weight matrix, sigma represents a forgetting gate, an input gate and an output gate from top to bottom respectively, and f _t 、i _t 、O _t The result of the output of the three gates is shown,

as a candidate for hidden unit, C _t Is the cell state at the current time, h _t Is the output at the current time;

in the self-attention mechanism network, the input data X is multiplied by the corresponding weight matrix to construct a corresponding matrix vector Q, K, V, and the specific construction formula is as follows:

wherein D _k Represents the dimension of vector K, D _v Representing the dimension of the vector V, and then calculating the final output vector Out by the following scaling dot product operations:

further, the deep layer feature extraction module in the step (2) specifically comprises the following components:

in the residual error network, three one-dimensional convolution modules with 64 channels are used as a filter for extracting a characteristic diagram, Mish is used as an activation function, and the obtained shallow layer characteristics are smoothly transmitted to a subsequent network;

in the convolution network, one-dimensional convolutions with channel numbers of 128, 256 and 128 are adopted for sequential stacking, each layer uses LeakyReLu, PReLU and ELU as activation functions, and a channel attention mechanism network is inserted between the first layer and the second layer and between the second layer and the third layer; and finally, a Global posing layer is accessed for performing average value pooling on the track sequence, increasing the receptive field, removing redundant information and reducing the parameter calculation amount of the network model.

Further, the channel attention mechanism network in the step (2) is specifically configured as follows:

the channel attention mechanism network completes the compression operation of track information by adopting a global pooling layer, and enables a track sequence X with the batch size of B, the characteristic dimension of C and the length of T to belong to the E ^B×C×T Compressing the channel sequences to obtain a two-dimensional channel sequence X with the length of 1 ₁ ∈ ^B×C×1 :

X ₁ ＝Glob Average Pooling(X) (12)

The two full-connection layers which can carry out channel scaling are used for completing the weight assignment of the channel, and a weight characteristic diagram F is obtained ₁ ∈ ^B×C×1 ：

F ₁ ＝W ₂ *(W ₁ *X ₁ ) (13)

Wherein W ₁ 、W ₂ A weight matrix for two fully connected layers, representing a matrix multiplication;

f is to be ₁ Carrying out tensor expansion along the second dimension to obtain a complete target track characteristic diagram F epsilon ^B×C×T ：

F＝broadcast(F ₁ ,dim＝2) (14)

Further, the specific operations of step (3.3) and step (3.4) are as follows:

in the step (3.3), the time sequence feature M extracted from the time sequence feature extraction module is extracted ₁ And deep abstract features M in the deep feature extraction module ₂ Splicing according to the channel dimension to obtain a fusion characteristic M:

M＝Concat(M ₁ ,M ₂ ) (15)

in the step (3.4), the fusion characteristics M are firstly sent to a linear layer for matrix operation to obtain the final output vector N epsilon of the network model ^B×K :

N＝W _e *M (16)

Wherein W _e Is a parameter matrix of a linear layer, K represents the number of kinds of target track mode, and then the output is output toThe quantity N is sent into a softmax function for probability calculation, and a corresponding probability value vector P epsilon is obtained ^B×K ：

Wherein N is _i Representing the vector value when the category is i; finally, solving the maximum value of the probability vector P according to the first dimension to obtain the final recognition result Y belonging to the family ^K Which contains the recognition result y of each target track _i ：

Y＝max(P,dim＝1) (18)

[ advantages of the invention ]:

compared with the prior art, the invention has the advantages that:

(1) according to the method, the multilayer convolution module fused with the residual error network structure is adopted to fully extract the deep abstract characteristics of the target track, the channel attention mechanism is inserted to extract the associated information of different characteristics, and meanwhile, the bidirectional circulation network based on the self-attention mechanism is used to extract the time sequence information of the track data, so that the generalization performance of a method model is effectively improved, the overall recognition accuracy of a target track behavior mode is improved, and the method can adapt to the target track recognition scene in a complex environment;

(2) according to the method, only simple trace data screening work is needed, normalization processing of a maximum value and a minimum value is adopted, a large amount of data preprocessing work is not needed, the problem of excessive feature space selection is not considered, a method model is simple to apply, and meanwhile, the method is time-saving and labor-saving when retraining.

[ description of drawings ]:

FIG. 1 is a schematic diagram of a target track recognition method according to the present invention

FIG. 2 is a schematic diagram of a timing feature extraction module in the model of the method of the present invention

FIG. 3 is a schematic diagram of a deep layer feature extraction module in the model according to the method of the present invention

FIG. 4 is a schematic diagram of the channel attention mechanism in the model of the method of the present invention

FIG. 5 is a schematic diagram of a global poling layer in the method of the present invention

[ embodiment ] A method for producing:

in order to describe the technology used in the present invention in detail, a specific embodiment of the method of the present invention will be described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a target trajectory identification method based on a residual error network and an attention mechanism, including the following steps:

data preprocessing: and preprocessing a target track data set acquired from detection equipment such as radar, infrared and laser to obtain a corresponding training set, a corresponding verification set and a corresponding test set.

Extracting the track time sequence characteristics based on the BilSTM network: and inputting the preprocessed tensor form track data into a BilSTM network based on a self-attention mechanism, extracting time sequence information in the track data, and then inputting the track data into a one-dimensional global average pooling layer to remove redundant information and further extract the most important time sequence characteristics.

Extracting track deep level abstract features based on a multilayer convolutional network: and inputting the preprocessed tensor form track data into a multilayer convolution network integrating a residual error network and a channel attention mechanism, extracting deeper abstract features of the track data by using the structural characteristics of the residual error structure and the channel attention and the strong representation capability of the convolution network, then inputting the extracted abstract features into a one-dimensional global average pooling layer, removing redundant information, and further extracting the most important abstract features.

Training a target track recognition model: and performing feature splicing on the time sequence features and the deep level abstract features extracted from the track data, inputting the time sequence features and the deep level abstract features into a softmax classifier, reversely adjusting the updating direction of the gradient by calculating the cross entropy loss, learning the mode rule of the behavior of the target track, and outputting the mode type of the target track.

The target track recognition model is applied as follows: and storing the best model meeting the precision requirement in training, and inputting all subsequently detected target track data into the model after preprocessing as a test set for target track identification in a real scene.

The following describes each of the above steps in detail:

first, data preprocessing

In order to identify the behavior pattern of the target track, the invention selects a plurality of attribute quantities which are helpful to identifying the behavior pattern of the target track from the collected target track data: longitude, latitude, heading, speed.

In order to reduce the time overhead of model reasoning, redundant track points of target track data collected by detection equipment are removed: overly dense trace points are collected over a period of time for selective retention, taking one trace point from approximately 5 trace points at a time.

The detection device may have a certain detection error due to the interference of electromagnetic noise, so some abnormal trace points are removed first: track points with the speed of zero or exceeding the maximum speed limit of the target motion for a long time, track points with the latitude and longitude range not within the detection range of the current detection equipment, and track points with the attribute quantity exceeding the critical limit value (the heading is 0 degree in the positive north direction, the range is 0-360 degrees, the range of the latitude and longitude is-180 degrees, and the range of the speed is generally 0-800 m/s).

Carrying out normalization operation on the target track data subjected to redundancy processing and exception processing, and normalizing each attribute variable by adopting a maximum-minimum method:

wherein X _min And X _max The minimum and maximum values representing the corresponding attribute variables are finally all converted into data in tensor form. About 70% of these data were separated as training set, about 10% were validated set, and the remaining 20% were tested set.

Second, trajectory time sequence feature extraction based on BilSTM network

The target trajectory data is time-series data with time characteristics, so that the time-series characteristic is an important characteristic in the trajectory data and a separate network module is required to process the data. The invention adopts a BilSTM network, namely a bidirectional long-short term memory network to process the time sequence characteristics of the track data.

The BilSTM network is a recurrent neural network capable of bidirectionally transmitting information, and continuously learns the time sequence information characteristics of track data through the LSTM units stacked in the positive direction and the negative direction. According to the invention, a two-layer BilSTM network is built, the number of hidden units of each layer is set to be 64, and a global average pooling layer is accessed behind the BilSTM network and used for pooling second dimension information for enriching extracted time sequence characteristics. A self-attention mechanism is also integrated between the two layers of the BilSTM network for extracting the correlation information between different characteristics of the track sequence, and the whole time sequence characteristic extraction module is shown in figure 2.

The specific feature extraction process is as follows: track data of about 3 minutes obtained through preprocessing is firstly converted into tensor data, the tensor size of the tensor data is 16 x 360 x 4, wherein 16 represents the size of a batch of input once, 360 represents the length of a track, and 4 represents four track attribute features. Then, after passing through two layers of BilSTM networks and a middle attention mechanism network, the feature vector with the size of 16 × 360 × 128 is output, and after passing through an average pooling layer as shown in FIG. 5, a 16 × 128 feature vector is obtained. Each time, a batch of track data is processed to obtain the corresponding time sequence characteristic vector.

Extracting track deep level abstract features based on multilayer convolution network

In the invention, a multilayer convolution network integrating a residual error network and a channel attention mechanism is built to form a track sequence deep layer feature extraction module, as shown in fig. 3. By utilizing the residual error network, the advantage that the network performance degradation occurs during the training of the multilayer convolution can be avoided, the inherent advantage of the channel attention mechanism during the extraction of the weight information on different channel dimensions is avoided, and the multilayer convolution network module is helped to extract the deep abstract features in the target trajectory data.

The invention adopts the one-dimensional convolution network to build the multilayer convolution network. Six convolutional network layers are sequentially built, the first three layers use Mish functions as activation functions, the second three layers respectively use LeakyReLu, PReLU, ELU and other functions as the activation functions, and the channel number is set to be 64, 128, 256 and 128. A residual connection is provided in the first three layers, and the convolutional network bypassing the first three layers is directly connected to the fourth layer. At the same time, a channel attention mechanism as shown in fig. 4 is inserted in the last three layers. Finally, a global average pooling layer as shown in fig. 5 is accessed behind the six-layer convolutional network, and the average pooling is performed on the last dimension.

The specific feature extraction process is as follows: and copying the track tensor data used when the track time sequence characteristics are extracted in the second step, adjusting the sequence of the last two dimensions, inputting the track tensor data into a multilayer convolution module, obtaining tensor data with the size of 16 multiplied by 128 multiplied by 360 after the six-layer convolution network and the two-layer channel attention mechanism, representing the extracted deep-level abstract characteristics, and finally obtaining the characteristic vector with the size of 16 multiplied by 128 through a global average pooling layer.

Fourth, training target track recognition model

And (4) performing concat operation on the track data time sequence characteristics and the deep-level abstract characteristics extracted in the previous two steps to complete fusion and splicing of the characteristics to obtain a batch of 16 x 256 characteristic vectors. Then, the vectors are input into a Softmax classifier for classification prediction, and 16 x 4 classification prediction vectors are obtained. Then input into the cross entropy loss function

The loss values of the network model on the training set and the validation set are calculated, where y _i Representing true class labels, p _i Representing the predicted classification label, and k representing the classification number; the invention adopts an optimization algorithm of SGD random gradient descent to update network parameters, obtains a plurality of network models with lower loss values after a plurality of rounds of training and adjustment, calculates the precision of the models in a corresponding verification set, saves the network models in a pkl form, and finally tests the modelsThe accuracy of these models is verified collectively.

The specific steps of model training are shown in the following table:

fifth, target track recognition model application

And selecting a network model with the highest precision from the trained multi-round models, loading a pkl file of the model, identifying a subsequently detected target track data set, and outputting corresponding target track behavior model types such as eight-turn winding, straight line and arc.

The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A target trajectory identification method based on a residual error network and an attention mechanism is characterized by comprising the following steps:

wherein X _min And X _max Minimum and maximum values representing the characteristics of the corresponding attributes, andtensor data to target trajectory { X ₁ ,X ₂ ,…X _n N is the total number of track points of one track. Of these tensor data, 70% are separated as training set, 10% as validation set and the remaining 20% as test set.

(2) Constructing a target track recognition model:

the target track recognition model comprises a time sequence feature extraction module based on a self-attention mechanism and a deep layer feature extraction module based on a residual error network and a channel attention mechanism, wherein:

(3.5) use of the Cross entropy loss function

Computing a predictive label y _i With a real label t _i The size of the loss;

2. The method for identifying the target trajectory based on the residual error network and the attention mechanism according to claim 1, wherein the time sequence feature extraction module in step (2):

f _t ＝σ(W _f ·[h _t-1, x _t ]+b _f ) (2)

i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i ) (3)

O _t ＝σ(W _o ·[h _t-1 ,x _t ]+b _o ) (6)

h _t ＝Ο _t ⊙tanh(C _t ) (7)

3. the method for identifying the target trajectory based on the residual error network and the attention mechanism as claimed in claim 1, wherein the deep feature extraction module:

in the convolution network, one-dimensional convolutions with channel numbers of 128, 256 and 128 are adopted for sequential stacking, each layer uses LeakyReLu, PReLU and ELU as activation functions, and a channel attention mechanism network is inserted between the first layer and the second layer and between the second layer and the third layer; and finally, a Globalpooling layer is accessed for performing average value pooling on the track sequence, increasing the receptive field, removing redundant information and reducing the parameter calculation amount of the network model.

4. The method of claim 3, wherein the channel attention mechanism network in the deep feature extraction module:

the channel attention mechanism network adopts a global pooling layer to complete the compression operation of track information, and a track sequence X belonging to the group of B in batch size, C in characteristic dimension and T in length belongs to ^B×C×T Compressing the channel sequences to obtain a two-dimensional channel sequence X with the length of 1 ₁ ∈ ^B×C×1 :

X ₁ ＝Glob Average Pooling(X) (12)

Using two full connections that can be scaled for a channelThe layer finishes the weight assignment of the channel to obtain a weight characteristic diagram F ₁ ∈ ^B×C×1 ：

F ₁ ＝W ₂ *(W ₁ *X ₁ ) (13)

F＝broadcast(F ₁ ,dim＝2) (14)

5. The method for identifying the target trajectory based on the residual error network and the attention mechanism as claimed in claim 1, wherein the specific operations of the step (3.3) and the step (3.4) are as follows:

M＝Concat(M ₁ ,M ₂ ) (15)

N＝W _e *M (16)

Wherein W _e K represents the number of types of target track modes, then the output vector N is sent to a softmax function for probability calculation, and a corresponding probability value vector P epsilon is obtained ^B×K ：

Y＝max(P,dim＝1) (18)。