WO2021255515A1

WO2021255515A1 - Multi-convolutional attention unit for multivariable time series analysis

Info

Publication number: WO2021255515A1
Application number: PCT/IB2020/061239
Authority: WO
Inventors: Rui Jorge PEREIRA GONÇALVES; Fernando Manuel FERREIRA LOBO PEREIRA
Original assignee: Universidade Do Porto
Priority date: 2020-06-15
Filing date: 2020-11-27
Publication date: 2021-12-23

Abstract

The present invention relates to attention mechanisms applicable to perform Multivariable Time-Series analysis using Recurrent Neural Networks. It is developed a multi-convolutional attention unit which is able to generate one independent attention vector per variable of Multivariable Time-Series input data (1), using one-dimensional convolutional operations to capture the importance of a time-step inside a sub-pattern. For that purpose, the attention unit comprises a splinting block (2), a attention block (3), a concatenation block (4) and a scaling block (5).

Description

DESCRIPTION

MULTI-CONVOLUTIONAL ATTENTION UNIT FOR MULTIVARIABLE TIME

SERIES ANALYSIS

FIELD OF THE INVENTION

The present invention is enclosed in the field of Recurrent Neural Networks. In particular, the present invention relates to attention mechanisms applicable to perform Multivariable Time-Series analysis using Recurrent Neural Networks.

PRIOR ART

Attention is a mechanism to be combined with Recurrent Neural Networks (RNN) allowing it to focus on certain parts of the input sequence when predicting a certain part of the output sequence, enabling easier learning and of higher quality. Combination of attention mechanisms enabled improved performance in many tasks making it an integral part of modern RNNs.

Attention was originally introduced for machine translation tasks, but it has spread into many other application areas. On its basis, attention can be seen as a residual block that multiplies the result with its own input hi and then reconnects to the main Neural Network (NN) pipeline with a weighted scaled sequence. These scaling parameters are called attention weights oci and the result is called context weights q for each value i of the sequence, i.e. all together, are called context vector c of sequence size n. This operation is given by:

Computation of c is given by applying a softmax activation function to the input sequence x^l on layer l:

This means that the input values of the sequence will compete with each other to receive attention, knowing that, the sum of all values obtained from the softmax activation is 1, the scaling values in the attention vector a will have values between [0,1]. This mechanism is called 'soft attention' because it is a fully differentiable deterministic mechanism that can be plugged into an existing system, and the gradients are propagated through the attention mechanism at the same time they are propagated through the rest of the NN. 'Soft attention' follows the regular and easier backpropagation method to compute the gradient, and the accuracy is subject to the assumption that the weighted average is a good representation for the area of attention.

The attention mechanism can be applied before or after recurrent layers. If attention is applied directly to the input, before enter into a RNN, it is called attention before, otherwise, if it is applied to a RNN output sequence, it is called attention after.

In case of Multivariate Time-Series (MTS) input data, a bidimensional dense layer is used to perform attention, which is subject to permutation operations before and after this layer, so the attention mechanism can be applied between values inside each sequence and not between each time step of all sequences.

Solutions exist in the art where, such as the case of patent application US2018060665A1, which discloses a system and method for Dual Stage Attention Based Recurrent Neural Network for Time Series Prediction. It is proposed decoding the encoded hidden states to generate a predicting model, the decoding including adaptively prioritizing encoded hidden states using temporal attention. The system and method further include generating predictions of future events using the predicting model based on the data sequences. It is further considered to generate signals for initiating an action to devices based on the predictions. However, the present document is silent on the computation of a sub-pattern associated with each time-series segment, and the calculation of a bidimensional feature map.

Document US9830709B2 described a method for video analysis with convolutional attention recurrent neural network. This method includes generating a current multi dimensional attention map. The current multi-dimensional attention map indicates areas of interest in a first frame from a sequence of spatiotemporal data. The method further includes receiving a multi-dimensional feature map and convolving the current multi-dimensional attention map and the multi-dimensional feature map to obtain a multi dimensional hidden state and a next multi-dimensional attention map. The method identifies a class of interest in the first frame based on the multi-dimensional hidden state and training data. However, the proposed method does not consider or even suggest to compute each time-series variable individually, nor it uses softmax activation function to perform time-series segment competition.

Document CN110717577A relates to the field of data mining, in particular to a time series prediction model construction method for paying attention to regional information similarity. The invention mainly focuses on the similarity between the information of the previous time step region and the information of the current time step region through an attention mechanism, and uses the weighted fusion vector of all previous time steps in a Long-Short Term Neural Network. The prediction of the time sequence value is performed by combining the attention mechanism of the Long-Short-Term Neural Network and the attention area information. However, this document not mention or suggest the applicability of attention mechanisms to MTS analysis.

As a conclusion, state of the art seems to be silent on any adaptations required to an attention mechanism of an RNN architecture, which is applied to the specific case of MTS problems using convolutional multi head paths of attention, one per variable, to achieve a more accurate analysis.

The present solution intended to innovatively overcome such issues.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention a multi-convolutional attention unit to be applied in performing MTS analysis using an RRN architecture. This unit is able to generate one independent attention vector a per variable of MTS input data, using one-dimensional convolutional operations to capture the importance of a time-step inside a sub-pattern. A plurality of sub-patterns can be analysed using staked convolutional layers inside the attention unit.

In another object of the present invention it is described a processing system adapted to perform MTS analysis, which comprises the attention unit now developed. DESCRIPTION OF FIGURES

Figure 1 - block diagram representation of an embodiment of the Multi-Convolutional Attention Unit developed, wherein the reference signs represent:

1 - MTS bidimensional input data;

2 - Splitting block;

3 - Attention block;

4 - Concatenation block;

5 - Scaling block.

Figures 2 and 3 - block diagram representations of two embodiments of a processing system configured to perform MTS analysis, wherein the reference signs represent:

1 - MTS bidimensional input data;

2 - Splitting block;

3 - Attention block;

4 - Concatenation block;

5 - Scaling block;

6 - RNN;

7 - Dense layer;

Wherein, in Figure 2 is represented the embodiment of the processing system where the Attention Unit is applied before the RNN, and, in Figure 3, is represented the embodiment of the processing system where the Attention Unit is applied after the RNN.

DETAILED DESCRIPTION

The more general and advantageous configurations of the present invention are described in the Summary of the invention. Such configurations are detailed below in accordance with other advantageous and/or preferred embodiments of implementation of the present invention. It is described a multi-convolutional attention unit specially developed for performing MTS analysis using RNN architectures. As can be seen from figure 1, the MTS bidimensional input data (1) feds a splitting block (2) in order to extract individual time series sequence for each MTS variable present in the MTS bidimensional input data (1). A attention block (3), comprising at least one convolution filter, is programmed to create a 1-dimensional vector 'path' with one-dimensional convolutional layers for each sequence, and the result is concatenated using a concatenation block (4), in order to generate a bidimensional feature map of attention weights a . In the scaling block (5), said bidimensional feature map is then multiplied by the MTS bidimensional input data (1), in order to obtain bidimensional context map c . For that purpose, a skip connection can be implemented.

Figure 2 illustrates only one convolution filter per time-series sequence for each variable of the MTS bidimensional input data (1), in the particular case where the multi-convolutional attention unit is applied before the RNN (6). Figure 3 illustrates only one convolution filter per number of recursive cells generated sequence, if the multi-convolutional attention unit is applied after the RNN (6). It is important that before the concatenation operation performed by the concatenation block (4), each path outputted by the attention block (3) return a one dimensional vector with size n - Time Steps. These one dimensional vectors are concatenated with each other resulting in a bidimensional feature map of attention weights, a, with size: Time Steps x Variables.

The bidimensional feature map is compatible for multiplication, in the scaling block (5), with h - the MTS bidimensional input data (1) - to obtain the bidimensional context map.

The multi-convolutional attention unit is developed also considering the case where multiple small sub-sequence patterns are processed at the same time. To achieve this purpose, the attention block (3) is comprised by a plurality of convolution filters, where the one dimensional convolutional layers outputted from each filter is stacked into a multichannel one-dimensional convolution layers. In this case, the last convolution filter inside the attention block (3) of the attention unit, will return the one-dimensional vector which is the input of the concatenation block (4). As an alternative way to force the attention block to output a one-dimensional vector for each path/attention "block", is using an Average Pooling one dimensional layer, to average all previous obtained channels into one dimension. This will maintain

= 1_t important for competitive weighting, as long as the last filter convolution with many channels, uses softmax activation, as soft attention mechanism. In resume, the last single channel one-dimensional vector outputted from the attention block (3) must use the softmax activation so each value, in the resulting vector per variable, competes with each other, has a scaling factor in [0,1] range and all sum to 1.

With the multi-convolution attention unit developed, instead of processing individual steps it is possible to process segments, to give attention to a time- step according to its neighbour's values i.e. sub-pattern in the time-series. The importance of each segment will compete with all others in the same traditional way, using the softmax activation, as soft attention mechanism. Since each original sequence/time-series variable of the MTS bidimensional input data (1) will be scaled individually, each time-series variable is processed individually. Thus, a split operation, made by the splitting block (2), is used to apply the attention block (3) to each individual variable of the MTS bidimensional input data (1). Before scaling the inputs, with a matrix multiplication provided by the scaling unit (5), all obtained attention one dimensional vectors are concatenated by the concatenation block (4), resulting in a compatible bidimensional map.

In this way, it is possible to create one independent attention vector a per variable of the MTS bidimensional input data (1), using one-dimensional convolutional operations to capture the importance of a time-step inside a sub-pattern. A plurality of sub patterns can be analysed using staked convolutional layers inside the attention block (3).

EMBODIMENTS

It is developed a multi-convolutional attention unit for performing classification analysis of a MTS bidimensional input data (1).

Said attention unit comprises a splitting block (2), a attention block (3), a concatenation block (4) and a scaling block (5).

The splitting block (2) comprises processing means programmed to split a data input into a plurality of output sequences. The split operation is applied to create an attention "block" for each individual variable of the MTS bidimensional input data (1).

In its turn, the attention block (3) is comprised by processing means adapted to implement a one-dimensional convolution layer. It includes at least one filter and a softmax activation function, as a soft attention mechanism. More particularly, the function of the attention block (3) is to apply the one-dimensional convolution layer to the output sequences generated by the splitting block (2). The importance of each segment will compete with all others using the softmax activation. Since each original sequence/time-series variable of the MTS bidimensional input data (1) will be scaled individually, each time- series variable is processed individually.

The concatenation block (4) is used to concatenate the 1-dimensional vectors outputted by the attention block (3) in order to create a bidimensional feature map of attention weights, a.

Finally, the scaling block (5) comprises processing means adapted to multiply the MTS bidimensional input data (1) with the bidimensional feature map, to generate a context map, c. This can be performed implementing a skip connection of the MTS bidimensional input data (1). When dealing with MTS bidimensional input data (1) it is used a bidimensional dense layer for attention and in this way each time-step importance is extracted taking into consideration the context neighbours' values in the sequence.

In one embodiment, the attention unit is applied before a RNN (6). In this particular case, the splitting block (2) is programmed to split MTS bidimensional input data (1) into a time-series sequence per individual variable. In its turn, the attention block (2) is configured to apply the one-dimensional convolution layer to the time-series sequence of each individual variable outputted from the splitting block (2), to generate a one dimensional vector 'path' of size n - time-steps - per variable. The concatenation block (4) then concatenates the 1-dimensional vectors outputted by the attention block (3) and related to each individual variable, to generate a bidimensional feature map of attention weights, a, with size time — steps x variables.

Alternative, in another embodiment, the attention unit is applied after a recursive neural network layer. In this particular case, the splitting block (2) is programmed to slip the output of a RNN (6), used to process MTS bidimensional input data (1), into N recursive cells generated sequences. In its turn, the attention block (3) is configured to apply the one-dimensional convolution layer to each recursive cell generated sequence outputted from the splitting block (2), to generate a one-dimensional vector 'path' of size n - time-steps - per number of recursive cells. The concatenation block (4) concatenates the 1-dimensional vectors outputted by the attention block (3) for each recursive cell, to generate a bidimensional feature map of attention weights, a, with size time-steps x number of recursive cells.

In another embodiment of the attention unit, the processing means of the splitting block (2) are programmed to execute a Keras Lambda function.

In another embodiment of the attention unit, the one-dimensional convolutional layer of the recursive block (3) comprises two or more filters. In this situation, the attention block (3) is further configured to stack multi 'path' one-dimensional convolutional layers. In this away, many sub patterns can be analysed using staked convolutional layers inside the attention block (3). These vectors concatenated with each other result in a bidimensional feature map of attention weights. More particularly, the attention block (3) may apply stacked one-dimensional convolutional operations using ReLu activation. This method constructs one independent attention vector per variable of the MTS bidimensional input data (1), using one-dimensional convolutional operations to capture the importance of a time-step inside a sub-pattern.

As an alternative way to force the output vector of the attention block (3) to have one-dimension only, in one embodiment of the attention unit, the attention block (3) may also apply an Average Pooling one-dimensional layer to the average previous one-dimensional convolutional layers 'path'.

It is also described a processing system adapted to perform MTS analysis. For that purpose, the processing system comprises the multi-convolutional attention unit develop. The processing system further comprises processing means adapted to implement a RNN (6) architecture, such as Long Short-Term Memory.

In one embodiment of the processing system, the multi-convolutional attention unit is applied before the RNN (6) architecture. Alternatively, in another embodiment of the processing system, the multi-convolutional attention unit is applied after the RNN (6) architecture.

In another embodiment of the processing system, it further comprises a dense function, in order to be able to use dense/fully connected layers, as a way to process each time-step in relation to all values in the sequence.

As will be clear to one skilled in the art, the present invention should not be limited to the embodiments described herein, and a number of changes are possible which remain within the scope of the present invention. Of course, the preferred embodiments shown above are combinable, in the different possible forms, being herein avoided the repetition all such combinations.

EXPERIMENTAL RESULTS

As an example, we present the results from a case study related to the individual household electric power consumption. This dataset is provided by the UCI machine learning repository [1].

The focus is on MTS classification analysis, and so it is provided results comparisons between Deep Learning methodologies using accuracy and categorical cross-entropy metrics. As target value the average level of the global house active power consumption for the next 12 hours, in five classes, based on the last 72 hours. It is used a sliding window of 12 hours (i.e. 66\% overlap). The five classes to predict are levels from very low (level 0) to very high (level 4).

Simple Long Short-Term Memory: Accuracy: 37.70%

Table 1 Long Short-Term Memory with standard attention: Accuracy: 40.70%

Table 2

Long Short-Term Memory with Multi-convolution attention block of the invention:

Accuracy: 42. 06%

Table 3

REFERENCES

[1] - Alice Berard Georges Hebrail. Individual household electric power consumption DataSet, November 2010. http://archive.ics.uci.edu/ml/datasets/ Individual+household +electric+power+consumption .

Claims

1. Multi-convolutional attention unit for performing analysis of a multivariable time series bidimensional input data (1) characterised by comprising:

A splitting block (2), comprising processing means programmed to split an input data into a plurality of output sequences;

A attention block (3), comprising processing means adapted to implement a one-dimensional convolution layer, comprising at least one filter and a softmax activation function; the attention block (3) being configured to apply the one-dimensional convolution layer to the output sequences generated by the splitting block (2);

A concatenation block (4), configured to concatenate the 1-dimensional vectors outputted by the attention block (3), to generated a bidimensional feature map of attention weights, a;

A scaling block (5) configured to multiply the multivariable time series bidimensional input data (1) with the bidimensional feature map to generate a context map, c.

2 . Multi-convolutional attention unit according to claim 1, wherein the multi-convolutional attention unit is applied before a recursive neural network (6), and: the splitting block (2) is programmed to split a multivariable time series bidimensional input data (1) into a time-series sequence per individual variable; the attention block (3) is configured to apply the one-dimensional convolution layer to the time- series sequence of each individual variable outputted from the splitting block (2), to generate a one-dimensional vector 'path' of size n - time-steps - per variable; the concatenation block (4) is configured to concatenate the 1-dimensional vectors outputted by the attention block (3) of each individual variable, to generate a bidimensional feature map of attention weights, a,with size time-steps x variables.

3. Multi-convolutional attention unit according to claim 1, wherein the multi-convolutional attention unit is applied after a recursive neural network (6) and: the splitting block (2) is programmed to slip the output of the recursive neural network (6), used to process the multivariable time series bidimensional input data (1), into N recursive cells generated sequences; the attention block (3) is configured to apply the one-dimensional convolution layer to each recursive cell generated sequence outputted from the splitting block (2), to generate a one-dimensional vector 'path' of size n - time-steps - per number of recursive cells; the concatenation block (4) is configured to concatenate the 1-dimensional vectors outputted by the attention block (3) for each recursive cell, to generate a bidimensional feature map of attention weights, a, with size time-steps x number of recursive cells.

4. Multi-convolutional attention unit according to any of the previous claims, wherein the processing means of the splitting block (2) are programmed to execute a Keras Lambda function.

5 . Multi-convolutional attention unit according to any of the previous claims, wherein the one-dimensional convolutional layer comprises two or more filters; the attention block (3) being further configured to stack multi 'path' one-dimensional convolutional layers.

6. Multi-convolutional attention unit according to claim 5, wherein the attention block (3) is further configured to apply stacked one-dimensional convolutional operations using ReLu activation.

7 . Multi-convolutional attention unit according to claims 5 or 6, wherein the attention block (3) is further configured to apply an Average Pooling one dimensional layer to the average previous one-dimensional convolutional layers 'path'.

8. Processing system for performing multivariable time series analysis, comprising: processing means adapted to implement a recursive neural network (6); the multi-convolutional attention unit of claims 1 to 7.

9. Processing system according to claim 8, wherein the multi-convolutional attention unit is applied before the recursive neural network (6).

10. Processing system according to claim 8, wherein the multi-convolutional attention unit is applied after the recursive neural network (6).

11. Processing system according to any of the previous claims 8 to 10 further comprising a dense function.

12. Processing system according to any of the previous claims 8 to 11, wherein the recursive neural network (6) is Long Short-Term Memory.