CN115063710A

CN115063710A - Time sequence analysis method based on double-branch attention mechanism TCN

Info

Publication number: CN115063710A
Application number: CN202210513520.7A
Authority: CN
Inventors: 张弘力; 宋进; 徐光洋; 刘周; 孙赫然
Original assignee: Jilin Province Jilin Xiangyun Information Technology Co ltd
Current assignee: Jilin Province Jilin Xiangyun Information Technology Co ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-09-16

Abstract

The invention belongs to the field of time sequence analysis and discloses a time sequence analysis method based on a double-branch attention mechanism TCN. Step 1: performing input embedding processing on time series data of an input video; step 2: based on the input embedding of the step 1, performing feature extraction by using a double-branch attention time sequence module group; and step 3: and (3) processing by adopting a downstream task branch of video behavior analysis based on the characteristic data of the step (2). The method is used for solving the problem that the remote dependence modeling capability is insufficient when the time sequence data is processed in the prior art.

Description

Time sequence analysis method based on double-branch attention mechanism TCN

Technical Field

The invention belongs to the technical field of time sequence analysis, and particularly relates to a time sequence analysis method based on a double-branch attention mechanism TCN.

Background

In the time-series data processing, long-term dependence is used to describe how long the state of the current time can be affected by the state of the previous time. In the tasks related to time sequence processing, such as voice recognition, text translation, video behavior analysis and the like, whether the content of the next time period can be accurately predicted is determined by keeping the long-term dependency relationship, so that the establishment of effective long-term dependency is the key point. The traditional way of extracting the dependency information is an unsupervised or semi-supervised method. Bert (bidirectional Encoder retrieval from transformations) is a neural Network used for modeling Long-Term dependence in sequence data, and can store Long-distance dependence relations due to the characteristics of a large number of parameter quantities and a large Network scale, and application of bert (bidirectional Encoder retrieval) and Long-Short Term Memory networks (LSTM) with few parameters is limited by insufficient modeling capability for Long-Term dependence.

The attention mechanism in the Transformer has strong capability of storing long-term dependencies and is therefore well-suited for the processing of sequence data. BERT also achieves the best effect in natural language processing tasks through a language model constructed by a Transformer. However, training the BERT model consumes a large amount of computational resources and a large amount of training data, and the convergence speed of the network is slow, thereby limiting the application of the network. However, the attention mechanism can be applied to other time series models for capturing long-term dependency information to further improve the prediction accuracy of the model.

RNNs with a ring structure have the ability to model sequence data, but it is difficult to extract long-term dependency information efficiently. The Time Convolutional Network (TCN) designed by applying convolution to modeling of the sequence problem can reach or even exceed the accuracy of the RNN model, but the ability to obtain long-distance dependence is still insufficient because the size of the receptive field is limited when processing the time sequence problem due to the convolution kernel calculation adopted therein.

Therefore, when the existing neural network model processes time series data, the modeling capability for remote dependence still needs to be improved.

Disclosure of Invention

The invention provides a time sequence analysis method based on a double-branch attention mechanism TCN, which is used for solving the problem that the long-distance dependence modeling capability is insufficient when time sequence data is processed in the prior art.

The invention provides an electronic device.

The invention provides a computer-readable storage medium.

The invention is realized by the following technical scheme:

a timing analysis method based on a dual-branch attention mechanism (TCN) is characterized by comprising the following steps of:

step 1: performing input embedding processing on time series data of an input video;

and 2, step: based on the input embedding of the step 1, performing feature extraction by using a double-branch attention time sequence module group;

and step 3: and (3) processing by adopting a downstream task branch of video behavior analysis based on the characteristic data of the step (2).

A timing analysis method based on a dual branch attention mechanism TCN, where in step 1, the input time series data may also be audio, text or image;

and embedding the time sequence data with the time length t into a sparse space with the dimension c.

A timing sequence analysis method based on double-branch attention (TCN), wherein the double-branch attention timing sequence module in the step 2 is used for carrying out feature extraction on time sequence data, and each timing sequence module maps an input sequence from an input dimension to a higher dimension space;

the two-branch attention sequence module group comprises 4 sequence modules, each of which comprises a causal inflation residual connecting branch and a global attention branch.

A time sequence analysis method based on a dual-branch attention mechanism TCN is disclosed, wherein the dimension change process of data in the processing process of a dual-branch attention time sequence module is as follows: representing input timing data of a network model as { x } ₁ ，x ₂ ，x ₃ ，...，x _n }, mark as X ^1×n First, each node of the input data is embedded into a c-dimensional space, and the embedding result is marked as X ^c×n Then the data is processed by a time sequence module, and the dimension change is expressed as

X ^1×n →X ^c1×n →X ^c2×n →X ^c3×n →X ^c4×n ＝X _out 。

A time sequence analysis method based on a double-branch attention mechanism TCN is characterized in that a time sequence module needs to perform feature fusion on feature extraction results of a sparse causal residual error branch and a global attention branch, and the feature fusion is represented as

X _{temporal_block} ＝X _{causal_dilated} +X _{global_residual}

Wherein X _{temporal_block} Representing output data of time-sequential modules, X _{causal_dilated} Output data representing sparse causal branches, X _{global_residual} Output data representing a global attention branch.

A time sequence analysis method based on a double-branch attention mechanism TCN is disclosed, wherein a causal swelling residual connecting branch consists of 2 layers of causal swelling volumes and 2 residual connections; wherein the causal dilation convolution component preserves the structural settings in the original TCN; the residual connection is composed of a linear layer, softmax and a normalization layer and is used for extracting the similarity and the dependency between the output results of the causal convolution layer; the calculation process of the branch is represented as

X _{causal_dilated} ＝f _linear1 (X _cd1 )+f _linear2 (X _cd2 )+X _cd2

Wherein, X _{causal_dilated} Output data representing sparse causal residual branches, f _linear1 ，f _linear2 Representing residual join calculation, X _cd1 ，X _cd2 Output data representing a sparse causal convolutional layer.

A timing analysis method based on a dual-branch attention mechanism TCN is provided, the global attention branch is specifically,

firstly, extracting features from time sequence data through one-layer one-dimensional convolution, and then carrying out position coding on the feature extraction result of the convolution layer, wherein the position information in the time sequence is the key for predicting the subsequent state, so the global position information in the sequence is combined during coding;

secondly, inputting the characteristics with position codes into a multi-head attention layer for extracting global dependency in time sequence data, and adding a residual error connection containing one-dimensional convolution operation for improving the training convergence speed of global attention;

the calculation process of the branch is represented as

X _{global_residual} ＝X _{global-attention} +X _conv1d

Wherein, X _{global_residual} Output result, X, representing a global attention branch _{global_attention} Output result, X, representing the global attention horizon _conv1d Representing the output of the one-dimensional convolution operation.

A time sequence analysis method based on a double-branch attention mechanism TCN is characterized in that downstream task branches of downstream task branches in step 3 adopt different branch processing according to different tasks.

The invention has the beneficial effects that:

the invention effectively solves the problems that the RNN can not carry out large-scale parallel processing because only elements in one sequence can be processed at one time and the RNN is computationally intensive because all intermediate results are saved in the task processing by introducing the convolution operation into the time sequence task, thereby improving the convergence speed of model training.

According to the invention, by introducing an attention mechanism into the TCN, the problem of insufficient long-term dependence modeling capability of the TCN due to limited receptive field is effectively solved, and sequence information can be more effectively utilized through position coding.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram of an attention mechanism TCN network model architecture of the present invention.

Fig. 3 is a diagram of the effect of recognizing human behavior in the video according to the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

step 2: based on the input embedding of the step 1, performing feature extraction by using a double-branch attention time sequence module group;

the two-branch attention sequential module group comprises 4 sequential modules, and each sequential module comprises a causal inflation residual error connecting branch and a global attention branch.

A time sequence analysis method based on a dual-branch attention mechanism TCN is disclosed, wherein the dimension change process of data in the processing process of a dual-branch attention time sequence module is as follows: representing input timing data of a network model as { x } ₁ ，x ₂ ，x ₃ ，...，x _n Is marked as X ^1×n Firstly, each node of the input data is embedded into the c-dimensional space, and the embedding result is marked as X ^c×n Then the data is processed by a time sequence module, and the dimension change is expressed as

X ^1×n →X ^c1×n →X ^c2×n →X ^c3×n →X ^c4×n ＝X _out 。

X _{temporal_block} ＝X _{causal_dilated} +X _{global_residual}

Wherein, X _{temporal_block} Representing output data of time-sequential modules, X _{causal_dilated} Output data representing sparse causal branches, X _{alobal_residual} Output data representing a global attention branch.

A time sequence analysis method based on a double-branch attention mechanism TCN is disclosed, wherein a causal swelling residual connecting branch consists of 2 layers of causal swelling volumes and 2 residual connections; wherein the causal dilation convolution component preserves the structural settings in the original TCN; the residual error connection consists of a linear layer, softmax and a normalization layer and is used for extracting the similarity and the dependency between the output results of the causal convolution layer; the calculation process of the branch is represented as

X _{causal_dilated} ＝f _linear1 (X _cd1 )+f _linear2 (X _cd2 )+X _cd2

A timing analysis method based on a dual branch attention mechanism TCN, wherein a global attention branch is used for considering global semantic similarity information and emphasizing meanings of input entities at different positions by adding position information,

the calculation process of the branch is represented as

X _{global_residual} ＝X _{global_attention} +X _conv1d

Wherein, X _{global_residual} Output result, X, representing a global attention branch _{global_attention} Output result, X, representing a global attention horizon _conv1d Representing the output of the one-dimensional convolution operation.

A time sequence analysis method based on a double-branch attention mechanism TCN is characterized in that the downstream task branch of step 3 adopts different branch processing according to different tasks. Besides the downstream task branch of the video behavior analysis, the corresponding downstream task branch can be adopted for processing according to other different tasks, such as voice recognition, text translation, image classification and the like.

An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of the above.

Taking the behavior recognition of human beings in the video as an example: the purpose is to give a piece of video, such as a picture, where a person is making a salad, identify and analyze the behavior occurring during the making process and the corresponding start and stop times. The input video can be regarded as time series data formed by a series of image frames, the input time series data is embedded firstly, the characteristic extraction is carried out on the embedding result by a double-branch attention time sequence module group, the long-range dependence between the image frame data is modeled, so that the action classification label corresponding to each image frame is predicted, on the basis, the starting and stopping time of each action is judged by the downstream task branch in combination with sequence information, and the returned identification result is the name of all the actions and the corresponding action starting and stopping time in the video. The effect is shown in fig. 3.

Claims

1. A timing analysis method based on a dual-branch attention mechanism (TCN) is characterized by comprising the following steps of:

2. The timing analysis method based on the TCN of claim 1, wherein: specifically, in step 1, the input time-series data may also be audio, text or images;

3. The timing analysis method based on the TCN of claim 1, wherein: the dual-branch attention time sequence module in the step 2 is used for extracting the characteristics of the time sequence data, and each time sequence module maps the input sequence to a higher dimension space from the input dimension;

4. The timing analysis method based on the TCN of claim 3, wherein: the dimension change process of the data in the processing process of the double-branch attention time sequence module is as follows: representing input timing data of a network model as { x } ₁ ，x ₂ ，x ₃ ，...，x _n Is marked as X ^1×n Firstly, each node of the input data is embedded into the c-dimensional space, and the embedding result is marked as X ^c×n Then the data is processed by a time sequence module, and the dimension change is expressed as

X ^1×n →X ^c1×n →X ^c2×n →X ^c3×n →X ^c4×n ＝X _out 。

5. The timing analysis method based on the TCN of claim 4, wherein: the time sequence module needs to perform feature fusion on the feature extraction results of the sparse causal residual error branch and the global attention branch, and the result is represented as

X _{temporal_block} ＝X _{causal_dilated} +X _{global_residual}

Wherein, X _{temporal_block} Representing output data of time-sequential modules, X _{causal_dilated} Output data representing sparse causal branches, X _{global_residual} Output data representing a global attention branch.

6. The timing analysis method based on the TCN of claim 3, wherein: the causal inflation residual connecting branch consists of 2 layers of causal inflation volumes and 2 residual connections; wherein the causal dilation convolution component preserves the structural settings in the original TCN; the residual error connection consists of a linear layer, softmax and a normalization layer and is used for extracting the similarity and the dependency between the output results of the causal convolution layer; the calculation process of the branch is represented as

X _{causal_dilated} ＝f _linear1 (X _cd1 )+f _linear2 (X _cd2 )+X _cd2

Wherein X _{causal_dilated} Output data representing sparse causal residual branches, f _linear1 ，f _linear2 Representing residual join calculation, X _cd1 ，X _cd2 Output data representing a sparse causal convolutional layer.

7. The timing analysis method based on the TCN of claim 3, wherein: the global attention branch is specifically a branch of attention,

the calculation process of the branch is represented as

X _{global_residual} ＝X _{global_attention} +X _conv1d

8. The TCN-based timing analysis method according to claim 3, wherein: and the downstream task branch of the step 3 adopts different branch processing according to different tasks.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.