CN115510757A - Design method for long-time sequence prediction based on gated convolution and time attention mechanism - Google Patents

Design method for long-time sequence prediction based on gated convolution and time attention mechanism Download PDF

Info

Publication number
CN115510757A
CN115510757A CN202211250328.XA CN202211250328A CN115510757A CN 115510757 A CN115510757 A CN 115510757A CN 202211250328 A CN202211250328 A CN 202211250328A CN 115510757 A CN115510757 A CN 115510757A
Authority
CN
China
Prior art keywords
time
model
long
layer
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211250328.XA
Other languages
Chinese (zh)
Inventor
郑洪源
卢灿尧
陆梦俊
翟象平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202211250328.XA priority Critical patent/CN115510757A/en
Publication of CN115510757A publication Critical patent/CN115510757A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a model which can effectively and accurately capture the long-term correlation coupling output and input between time sequences aiming at long-term sequence prediction. The model of the present invention is called GCTAM (Gated Container Temporal Attention Mechanism). The model of the invention is based on the improvement of the Informer of Gated Convolution (Gated Convolution) and Temporal Attention Mechanism (Temporal Attention Mechanism). The following two improvements are mainly made: (1) The proposed gated convolution makes good use of time information and automatically routes results based on time information. (2) The proposed temporal attention mechanism filters low frequency noise well. Experiments on a plurality of real data sets prove that the method improves the expression capability of an Informer model, enhances the noise filtering capability of the model, and keeps higher prediction precision in the prediction of long-time sequences.

Description

Design method for long-time sequence prediction based on gated convolution and time attention mechanism
Technical Field
The invention relates to the field of long-time sequence prediction based on a gate convolution and time attention mechanism, and the provided GCTAM model is mainly used for improving the expression capability of an inform model, enhancing the noise filtering capability of the model and improving the prediction precision of the model in the aspect of predicting a long-time sequence.
Background
Long-time sequence prediction is an important research topic in the fields of weather, energy consumption, financial indexes, retail sale, medical monitoring, anomaly detection, traffic prediction and the like. In recent years, with the continuous development of deep learning, a Recurrent Neural Network (RNN), a Long Short Term Memory (LSTM), a Convolutional Neural Network (CNN), and a Transformer have shown good prediction performance in data prediction, and have been successfully applied to a large-scale application in many real world, including the field of Long sequence time prediction.
The existing RNN method is still limited by a long-sequence time sequence, that is, as the sequence length increases, the RNN-generated gradient becomes smaller (gradient disappears) or larger (gradient explosion), and the long-term dependence of the time sequence data cannot be well learned. With the explosion of modern deep learning, LSTM employs gated structures to control information flow to handle gradient disappearance or gradient explosion. At the same time, the gated structure captures long-term memory well, but still does not completely solve the problem of gradient disappearance. Subsequently, CNN is used for timing prediction. The CNN has great potential in the aspect of sequence modeling, is even better than RNN on many tasks, avoids the common problems of RNN, such as gradient explosion/disappearance and long-term poor memory, and supports parallel computation with higher efficiency than RNN. But there is still much room for improvement over transformers that capture information of arbitrary length.
The inform of the Transformer system solves the problems of gradient elimination and memory constraint in the prediction of long-sequence time sequences, but the inform only performs coarse-grained feature extraction on the time sequences and does not explicitly use time information. When the length of the time sequence is too long, all parameters are shared, which results in the generated noise being directly output through the full link layer. There has been some previous work on improving the expressiveness of models and enabling explicit analysis of features of time series samples. The microsoft AI cognitive services team proposed dynamic convolution. Compared with the traditional static convolution (single convolution kernel per layer), the dynamic superposition of multiple convolution kernels according to attention not only significantly improves expressive power, but also reduces computational cost. It is more friendly to efficient CNNs and can be easily integrated into existing CNN architectures.
Disclosure of Invention
The invention aims to: the conventional deep network time series prediction model has the problems of gradient explosion and disappearance, cannot well learn long-term dependence of time series data, and directly outputs generated noise through a full link layer when the length of the time series is too long. Therefore, how to deal with gradient disappearance or gradient explosion, capture long-term memory, filter noise and improve the accuracy of prediction in long-term sequence prediction through the design of the deep neural network model becomes a main technical problem.
In order to solve the technical problem, the invention provides a design method for long-time sequence prediction of a gating convolution and time attention mechanism, which can improve the expression capability of a model, enhance the noise filtering capability of the model and improve the prediction precision of the model.
The technical scheme is as follows: in order to achieve the technical effects, the technical scheme provided by the invention is as follows:
a design method of long time sequence prediction based on a gated convolution and time attention mechanism is characterized in that,
(1) A gating network mechanism is implemented to capture the long-term memory that is ignored in the inform mer distillation layer, introduce time-embedding in gating, classify and automatically route time features in a fine-grained manner, highlight more relevant features in the time series, and encode them together.
(2) The distillation layer of the traditional inform model is added with a Gated Convolutional Network (Gated Convolutional Network) combining the Mixture-of-applications (MoE) and the Dynamic Convolution (Dynamic Convolution).
(3) Attention mechanism based on Dual Attention gating (Dual Attention Gated). The mechanism filters the noise output of the model full-connected layer and improves the prediction precision of the model.
Further, the primary objective of the Informer in step (1) is to solve the problem of continuous prediction of long sequences, and not only to complete the characterization of long sequence input, but also to establish the link between long sequence output and long sequence input. As an improved model of the Transformer, the Informer still keeps the structures of the encoder and the decoder, but uses ProbSparse Self-attention instead of Self-attention, and adopts a new attention mechanism to reduce the calculation amount. In addition, a Self-authentication distinguishing technology is adopted, so that the dimension and the number of network parameters are reduced, and the spatial complexity is reduced. In the long sequence prediction problem, global information needs to be obtained, such as hierarchical timestamps (week, month, year), unknown timestamps (holidays, events). Therefore, the Informer also improves the embedded portion of the data compared to the Transformer. Specifically, value embedding, position embedding, and time embedding are added together as inputs to an encoder or decoder. After the Informer is added into a gating network mechanism, the time characteristics can be classified and automatically routed in a fine-grained manner, more related characteristics in a time sequence are highlighted and coded together, and the expression capacity of the model is improved.
Further, the gated convolutional network in the step (2) includes a model of the gated convolutional network combining MoE and Dynamic convergence.
Further, the original MoE model can be expressed as:
Figure BSA0000286240660000021
wherein when
Figure BSA0000286240660000022
Then, g (x) i Representing the weight of the ith expert or the corresponding gating scoring result; f. of i I = 1.. N denotes n expert networks and g denotes a gating network integrating a plurality of expert results. The multiple expert networks in MoE are designed to fit the partial data cases in the training data set that they are good at fitting, which is somewhat equivalent to using local interpolation methods, with different data sets being fitted by different local models (experts), where gating is equivalent to one controlling the weight.
Further, the gated convolution model proposed by the present invention can be expressed as:
Figure BSA0000286240660000031
wherein DynRoute represents replacement of Convld, [. Cndot.] AB In the representation of Attention Block;
conv1d (-) uses ELU activation function in original text and performs maximum pool layer with step 2 represented by MaxPool with one-dimensional convolution filter (kernel bandwidth 3.) in time dimension;
simultaneously after stacking one layer, X is added t The downsampling takes half of the slice, and the extraction process is from the jth layer to the (j + 1) th layer.
Figure BSA0000286240660000032
Where U is the output of each one-dimensional convolutional layer and K is the number of convolutional layers.
Figure BSA0000286240660000033
Where a is the attention coefficient of each expert and E is the temporal embedding layer (temporal embedding) introduced.
Figure BSA0000286240660000034
Wherein DynRoute displays the result of dynamic routing, i.e., each 1-dimensional convolutional layer is multiplied by the attention coefficient of the corresponding expert and then summed.
Furthermore, in the step (3), a Dual Attention gate (Dual Attention Gates) is added after the original fully-connected layer of the Informer, and a time Attention mechanism is used to recode the output of the decoding layer (decoding), so that the method plays a good role in filtering the output of the model and improving the performance of the model. The time attention mechanism proposed by the invention is shown as (6) to (8):
Figure BSA0000286240660000035
Figure BSA0000286240660000036
Figure BSA0000286240660000037
where α and β are attention parameters that can be learned from fully connected layers, W α ,b α ,W β ,b β Are parameters that the model learns and updates during training. The output vector and the update parameter may calculate attention parameters (α and β) in the neural network through activation functions tanh () and tanh spring () in the neural network. Based on these calculated attention parameters, key information in the time series can be better captured. Thereby filtering out noise in the fully connected layer.
Figure BSA0000286240660000038
Is the output of the full connection layer and the output of the whole model in the invention.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram illustrating the overall architecture of the original Informer network in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a Gated Convolution (Gated Convolution) network proposed in an embodiment of the present invention;
FIG. 4 is a schematic diagram of the overall network model in the embodiment of the present invention;
FIG. 5 is a single variable long-sequence time series prediction result graph for 4 data sets (5 cases) in an embodiment of the present invention;
FIG. 6 is a graph of the multi-variable long-sequence time-series prediction results for 4 data sets (5 cases) in the example of the present invention;
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The embodiment of the invention relates to a design method of long time sequence prediction based on a gated convolution and a time attention mechanism, which comprises the following steps as shown in figure 1:
(1) Acquiring data;
(2) Preprocessing data and constructing a data set of time series characteristics;
(3) Constructing a design method of a long-time sequence prediction model based on a gating convolution and time attention mechanism;
(4) Judging whether the noise filtering effect of the model is good or not and whether the prediction effect is accurate or not;
in step (1), the data acquisition for this experiment included 3 real datasets and 2 common reference datasets collected for the LSTF. 5 public data sets ECL, weather, ETTm1, ETTh2, and ETTh1, respectively.
In the step (2), the prediction processing is performed on the obtained data,
1) Where ETT (Electric Transformer Temperature) is the division of the data set into three according to the time granularity of the data, being on the order of 1 hour { ETTh1, ETTh2} and 15 minutes of ETTm1. The data is power transformer operation data collected by Beijing aerospace university, and comprises two sites and operation data of two continuous years, wherein the training set, the verification set and the test set are respectively data of 12 months, 4 months and 4 months;
2) Wherein an ECL (electric comfort Load) data set collects power consumption (Kwh) of 321 clients, converts data units into hourly power consumption due to data loss, and sets 'MT 320' as a target value, wherein a training set, a verification set and a test set are data of 15 months, 3 months and 4 months respectively
3) The Weather data set collects Weather data of 1600 areas in the united states in hours between 2010 and 2013 and 4 years. Each data point includes a target value and 11 climate characteristics. Wherein the training set, the verification set and the test set are respectively data of 28 months, 10 months and 10 months;
in step (3), the invention constructs a prediction model of long-time sequences based on gated convolution and time attention mechanism.
1) Based on the network architecture of the original Informer model, the Informer model is shown in fig. 2. The distillation layer of the inform model was added to a gated Convolution network combining MoE and Dynamic convention. The gated convolution model is shown in fig. 3. The gating network mechanism captures the long-term memory that is ignored in the Informer distillation layer, introduces Temporal Embedding in the gating, classifies and automatically routes Temporal features in a fine-grained manner, highlights more relevant features in the Temporal sequence, and encodes them together.
2) The Dual attribute is added behind the original Informer full connection layer, and the output of the decoding layer is recoded by using time Attention, so that the effect of filtering the output of the model is achieved, and the performance of the model is improved. The overall improved model network structure is shown in fig. 4.
3) The invention compares ARIMA, prophet, LSTMa, LSTnet and Deepar five time sequence prediction methods, and in order to better explore the effect of GCTAM in long sequence time sequence prediction, the invention uses Informmer and variant Reformer and LogSparse self-attack.
4) The hyper-parameter adjustment of the invention uses Adam optimizer and the learning rate is 1e -4 By a factor of two per epoch. The total cycle number is 8, there is an appropriate early stop strategy, and the batch size is set to 32 according to the recommended setting.
5) The input to each data set is normalized to zero mean. Under the LSTF setting, the prediction window size is gradually expanded, namely {1d,2d,7d,14d,30d,40d } in { ETTh, ECL, weather } and {6h,12h,24h,72h,168h } in ETTm.
6) The evaluation indices of the model of the invention are MSE and MAE and the whole set is rolled over each prediction window (average of multivariate predictions) with stride as 1.
In step (4), the present invention applies the model to the univariate long-sequence time series and the multivariate long-sequence time series, and performs verification on the data sets in step (2), respectively. The results of the experiment are shown in fig. 5, and fig. 6. Figures 5 and 6 summarize the results of univariate/multivariate evaluation of GCTAM compared to all other models presented on the 5 datasets.
1) Univariate time series prediction: each method yields predictions as a single variable over a long time series. As can be seen from fig. 5: (1) The model GCTAM significantly improves the prediction accuracy over all datasets, with the prediction error rising smoothly and slowly as the prediction length increases, indicating that GCTAM is successful in improving the prediction power of the LSTF problem. (2) GCTAM decreased on average 18.5%, 9.1% and 10.5% over MSE, better than its baseline model, informer. Meanwhile, the method provided by the method is superior to other LogTransns and Reformer methods based on a Transformer model.
2) Multivariate time series prediction: by adjusting the fully connected layer (FCN), the GCTAM proposed by the present invention can be easily converted from univariate prediction to multivariate prediction. From fig. 6, it is observed that: the proposed model GCTAM performs better than other methods at MSE, with GCTAM being reduced by 3.0%,3.1% and 5.6% on average over the baseline model Informer at MSE, and better than other models.
The GCTAM model provided by the invention can enhance the noise filtering capability of the model, can keep higher prediction precision in long-time sequence prediction, and can well improve the prediction capability of the LSTF problem.

Claims (2)

1. A design method of long time sequence prediction based on a gated convolution and a time attention mechanism is characterized by comprising the following steps:
(1) On the basis of solving the continuous prediction of a long-time sequence by an Informer model, the characteristics of long-time sequence input are perfected, the relation between the output of the long-time sequence and the input of the long-time sequence is established, gatedConvolation is added into the network model of the Informer, a gating network mechanism is realized to capture the overlooked long-time memory in a distillation layer of the Informer, temporal embedding is introduced into gating, time characteristics are classified and automatically routed in a fine-grained manner, more related characteristics in the time sequence are highlighted, and the more related characteristics are coded together;
(2) Combining the Gated Container network with the hybrid-of-Experts (MoE) and the dynamic Convolution as an improved Inform distiller layer structure:
1) The original MoE model can be expressed as:
Figure FSA0000286240650000011
wherein when
Figure FSA0000286240650000012
In time, g (x) i Representing the weight of the ith expert or the corresponding gating scoring result; f. of i I = 1.. N denotes n expert networks and g denotes a gating network integrating a plurality of expert results. The multiple expert networks in MoE are designed to fit the partial data cases in the training data set that they are good at fitting, with different data sets being fitted by different local models (experts), where gating is equivalent to one expert controlling the weights.
2) The distillation layer structure in the inform model is improved, a plurality of parallel one-dimensional convolution layers (Convld) are used for replacing the original single convolution layer, and the output in the Attention Block is gated through a time perception gate. Meanwhile, temporal information is introduced into the time perception gate, namely temporal embedding replaces original embedding. The gated convolution model proposed by the invention is shown in (2) - (5):
Figure FSA0000286240650000013
wherein DynRoute represents replacement of Convld, [. Cndot.] AB In the representation of Attention Block;
convld (-) uses the ELU activation function in the original text and performs the maximum pool level with stride of 2 in a one-dimensional convolution filter (kernel bandwidth of 3.) MaxPool in the time dimension;
while after stacking one layer, X is added t The downsampling takes half of the slice, and the extraction is from the jth layer to the (j + 1) th layer.
Figure FSA0000286240650000014
Where U is the output of each one-dimensional convolutional layer, and K is the number of convolutional layers.
Figure FSA0000286240650000021
Where a is the attention coefficient of each expert and E is the temporal embedding layer (temporal embedding) introduced.
Figure FSA0000286240650000022
Wherein DynRoute displays the result of dynamic routing, i.e., each 1-dimensional convolutional layer is multiplied by the attention coefficient of the corresponding expert and then summed.
(3) A double Attention Gates (Dual Attention Gates) is added behind an original fully-connected layer of the Informer, and a time Attention mechanism is used for recoding the output of a decoding layer (decoding), so that the method plays a good role in filtering the output of the model and improving the performance of the model. The time attention mechanism proposed by the invention is shown as (6) to (8):
Figure FSA0000286240650000023
Figure FSA0000286240650000024
Figure FSA0000286240650000025
where α and β are attention parameters that can be learned from the fully-connected layer, W α ,b α ,W β ,b β Are parameters that the model learns and updates during training. The output vector and the update parameters may calculate attention parameters (α and β) in the neural network through activation functions tanh () and tanh spring () in the neural network. Based on these calculated attention parameters, key information in the time series can be better captured. Thereby filtering out noise in the fully connected layer.
Figure FSA0000286240650000026
Is the output of the full connection layer and the output of the whole model in the invention.
2. The design method of long-term sequence prediction based on gated convolution and time attention mechanism as claimed in claim 1, wherein a Gating Network (Gating Network) mechanism is proposed to capture the memory of the long-term time originally ignored in the inform mer distillation layer, time embedding is introduced in the Gating to classify and automatically route the time features in a fine-grained manner, more relevant features in the time sequence are highlighted and coded together, and a double attention Gating based time attention mechanism is proposed to filter the output of the model full connection layer, thereby improving the expression capability of the inform mer model, enhancing the noise filtering capability of the model and improving the prediction accuracy of the model.
CN202211250328.XA 2022-10-12 2022-10-12 Design method for long-time sequence prediction based on gated convolution and time attention mechanism Pending CN115510757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211250328.XA CN115510757A (en) 2022-10-12 2022-10-12 Design method for long-time sequence prediction based on gated convolution and time attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211250328.XA CN115510757A (en) 2022-10-12 2022-10-12 Design method for long-time sequence prediction based on gated convolution and time attention mechanism

Publications (1)

Publication Number Publication Date
CN115510757A true CN115510757A (en) 2022-12-23

Family

ID=84509898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211250328.XA Pending CN115510757A (en) 2022-10-12 2022-10-12 Design method for long-time sequence prediction based on gated convolution and time attention mechanism

Country Status (1)

Country Link
CN (1) CN115510757A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245141A (en) * 2023-01-13 2023-06-09 清华大学 Transfer learning architecture, method, electronic device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245141A (en) * 2023-01-13 2023-06-09 清华大学 Transfer learning architecture, method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN112241814B (en) Traffic prediction method based on reinforced space-time diagram neural network
Lin et al. An efficient deep reinforcement learning model for urban traffic control
CN111612243B (en) Traffic speed prediction method, system and storage medium
CN111210633A (en) Short-term traffic flow prediction method based on deep learning
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
CN112949828A (en) Graph convolution neural network traffic prediction method and system based on graph learning
CN113905391A (en) Ensemble learning network traffic prediction method, system, device, terminal, and medium
CN113298191B (en) User behavior identification method based on personalized semi-supervised online federal learning
CN112396234A (en) User side load probability prediction method based on time domain convolutional neural network
CN113112791A (en) Traffic flow prediction method based on sliding window long-and-short term memory network
CN114036850A (en) Runoff prediction method based on VECGM
CN114970774A (en) Intelligent transformer fault prediction method and device
Li et al. Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction
CN115755219A (en) Flood forecast error real-time correction method and system based on STGCN
CN115510757A (en) Design method for long-time sequence prediction based on gated convolution and time attention mechanism
CN114282443A (en) Residual service life prediction method based on MLP-LSTM supervised joint model
CN114444561A (en) PM2.5 prediction method based on CNNs-GRU fusion deep learning model
CN116596151A (en) Traffic flow prediction method and computing device based on time-space diagram attention
CN113673774A (en) Aero-engine remaining life prediction method based on self-encoder and time sequence convolution network
Wang et al. Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Tu et al. Longer time span air pollution prediction: The attention and autoencoder hybrid learning model
Kim et al. A daily tourism demand prediction framework based on multi-head attention CNN: The case of the foreign entrant in South Korea
CN116525135B (en) Method for predicting epidemic situation development situation by space-time model based on meteorological factors
CN116258260A (en) Probability power load prediction method based on gating double convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination