CN114219027A

CN114219027A - Lightweight time series prediction method based on discrete wavelet transform

Info

Publication number: CN114219027A
Application number: CN202111536500.3A
Authority: CN
Inventors: 樊谨; 王则昊; 吉玉祥; 汪森; 孙丹枫
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-22

Abstract

The invention discloses a lightweight time series prediction method based on discrete wavelet transform, which adopts a waveform decomposition module to decompose an input sequence to obtain a low-frequency component and a high-frequency component, so that the lengths of the two components are half of the input sequence, and then adopts a discrete feature extraction method based on a discrete network for extracting features in a layered parallel manner to predict the two components respectively; aiming at the defect of high computational complexity of the attention mechanism, the discrete network adopts the discrete attention mechanism to calculate the attention value in a blocking way, thereby reducing the computational complexity of the model. And finally, generating a final prediction sequence by adopting a waveform reconstruction module. The method can improve the resource utilization rate, and the smaller model size makes the method more competitive on the equipment with limited resources.

Description

A Lightweight Time Series Prediction Method Based on Discrete Wavelet Transform

技术领域technical field

本发明属于时间序列预测领域，具体涉及一种基于离散小波变换的轻量级时间序列预测方法。The invention belongs to the field of time series prediction, in particular to a lightweight time series prediction method based on discrete wavelet transform.

背景技术Background technique

近年来，时间预测技术广泛地用在设备健康预测系统、天气预测、股票预测等各个领域。时间序列预测是时间序列分析领域的一个重要分支，通常时间序列预测方法会对历史中的时间序列不断学习与分析，从而提取出决定该时间序列变化的特征，在该特征的基础上，对未来一段时间内的时间序列变化趋势进行预测。In recent years, time forecasting technology has been widely used in various fields such as equipment health forecasting systems, weather forecasting, and stock forecasting. Time series forecasting is an important branch in the field of time series analysis. Usually, the time series forecasting method will continuously learn and analyze the time series in history, so as to extract the characteristics that determine the changes of the time series. Predict the trend of time series changes over a period of time.

随着对时间序列预测问题研究的不断深入，以及各种优秀方法的不断出现，使得时间序列预测问题对新方法的要求不断增高，表现在对预测精度等更高要求、预测序列长度的增加、单变量时间序列向多变量时间序列的转变、要求模型规模尽可能缩小从而使其得到更广泛的应用，等等。With the continuous deepening of research on time series forecasting problems and the continuous emergence of various excellent methods, the requirements for new methods in time series forecasting problems are constantly increasing, which is manifested in higher requirements for forecasting accuracy, increase in forecasting sequence length, The transformation of univariate time series to multivariate time series, the requirement to reduce the size of the model as much as possible to make it more widely applicable, etc.

近年来，越来越多的时间序列预测方法专注于提升预测精度、增长预测序列长度。随着时间序列预测问题的要求逐渐增高，众多方法在学习时间序列中的长距离依赖关系问题上越来越乏力，难以取得进一步的突破。直到基于注意力机制(Attention,AT)的Transformer方法的提出，一个新的强大的模块带来了新的视野，得益于其在提取距离较长的两个元素之间依赖关系问题上有着突破性的提高。越来越多的方法中将Transformer方法用于时间序列预测问题上，取得了很好的进展。但Transformer有着较高的计算复杂度，模型规模庞大，使得它对内存具有很高的要求，从而无法直接用于更长的预测要求。于是，越来越多用于改善Transformer的计算复杂度的Transformer变体模型被提出，使其在更长时间序列预测中取得更好的效果。在众多变体模型中，离散特征提取方法(Sepformer)有着相当大的提升。In recent years, more and more time series forecasting methods focus on improving forecasting accuracy and increasing the length of forecasting sequences. With the increasing requirements of time series forecasting problems, many methods are getting weaker and weaker in learning long-distance dependencies in time series, and it is difficult to make further breakthroughs. Until the Transformer method based on the attention mechanism (Attention, AT) was proposed, a new powerful module brought a new vision, thanks to its breakthrough in extracting the dependency between two elements with a long distance. Sexual improvement. In more and more methods, the Transformer method is used in time series forecasting problems, and good progress has been made. However, Transformer has a high computational complexity and a large model size, which makes it highly memory-intensive and cannot be directly used for longer prediction requirements. As a result, more and more Transformer variant models are proposed to improve the computational complexity of Transformer, so that they can achieve better results in longer-term series prediction. Among the many variant models, the discrete feature extraction method (Sepformer) has a considerable improvement.

离散特征提取方法(Sepformer)采用分层平行提取全局特征和局部特征的离散网络(Separate Network)，从而提升了整个模型的精度。离散网络(Separate Network)针对注意力(Self-attention)机制的高计算复杂度的缺点，采用了离散注意力(SeparateAttention)机制进行分块计算attention数值，从而降低了模型的计算复杂度至O(C)。该方法可以提高多元时间序列预测的精度、对比已存在的方法降低了计算复杂度以及增加最大预测长度。但该方法依旧具有较大的模型规模，资源利用率较低。The discrete feature extraction method (Sepformer) uses a discrete network (Separate Network) that extracts global features and local features in layers and parallel, thereby improving the accuracy of the entire model. In view of the shortcomings of the high computational complexity of the attention (Self-attention) mechanism, the discrete network (Separate Network) adopts the discrete attention (SeparateAttention) mechanism to calculate the attention value in blocks, thereby reducing the computational complexity of the model to O( C). This method can improve the accuracy of multivariate time series forecasting, reduce the computational complexity and increase the maximum forecast length compared with existing methods. However, this method still has a large model scale and low resource utilization.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是在保证预测精度的前提下，尽量降低模型占用内存规模，使得模型在各项技术问题上达到一个平衡(trade-off)。本发明提供一种基于离散小波变换的轻量级时间序列预测方法，经过测试后，在极大程度上保留了离散特征提取方法的高精度、低计算复杂度以及长序列预测能力，并且进一步减小模型规模，提高了资源利用率。The technical problem to be solved by the present invention is to reduce the memory size occupied by the model as much as possible on the premise of ensuring the prediction accuracy, so that the model can achieve a trade-off on various technical problems. The present invention provides a lightweight time series prediction method based on discrete wavelet transform. After testing, the high precision, low computational complexity and long sequence prediction ability of the discrete feature extraction method are retained to a great extent, and the Small model scale improves resource utilization.

本发明采用的技术方案是：采用波形分解模块，对输入序列进行分解得到低频分量和高频分量，使得两个分量的长度均为输入序列的一半，然后采用离散特征提取方法(Sepformer)来对两个分量分别预测，离散特征提取方法基于分层平行提取特征的离散网络(Separate Network)。离散网络(Separate Network)针对注意力(Self-attention)机制的高计算复杂度的缺点，采用了离散注意力(SeparateAttention)机制进行分块计算attention数值，从而降低了模型的计算复杂度。最后采用波形重构模块生成最终预测序列。该方法可以提高资源利用率，更小的模型规模使得其在资源受限制的设备上更具竞争力。The technical scheme adopted in the present invention is: adopting a waveform decomposition module to decompose the input sequence to obtain low-frequency components and high-frequency components, so that the lengths of the two components are both half of the input sequence, and then adopting a discrete feature extraction method (Sepformer) to The two components are predicted separately, and the discrete feature extraction method is based on the discrete network (Separate Network) of hierarchical parallel feature extraction. In view of the shortcomings of the high computational complexity of the Self-attention mechanism, the Discrete Network (Separate Network) adopts the Discrete Attention (Separate Attention) mechanism to calculate the attention value in blocks, thereby reducing the computational complexity of the model. Finally, the waveform reconstruction module is used to generate the final prediction sequence. This approach can improve resource utilization, and the smaller model size makes it more competitive on resource-constrained devices.

一种基于离散小波变换的轻量级时间序列预测方法，步骤如下：A lightweight time series prediction method based on discrete wavelet transform, the steps are as follows:

步骤1：数据预处理，获得训练数据集和验证数据集。Step 1: Data preprocessing to obtain training datasets and validation datasets.

步骤2：借助于步骤1得到的训练数据集，在设备条件允许的情况下，每次随机选取32组训练数据，将每组数据中的历史序列和起始序列分别输入到两个波形分解(WaveformDecomposition)模块中，将输入的序列分解为低频分量(approximate coefficient)和高频分量(detail coefficient)。Step 2: With the help of the training data set obtained in Step 1, 32 sets of training data are randomly selected each time when the equipment conditions allow, and the historical sequence and the starting sequence in each set of data are input into two waveform decompositions ( In the WaveformDecomposition module, the input sequence is decomposed into low-frequency components (approximate coefficient) and high-frequency components (detail coefficient).

步骤3：将步骤2得到的低频分量和高频分量，将其分别输入到两个离散特征提取模块(Sepformer)中进行特征提取。每个离散特征提取模块中包含两个编码器(Encoder)和一个解码器(Decoder)，将输入的对应分量输入到编码器中的离散网络(SeparateNetwork)进而提取全局特征和局部特征，最终得到对应于两个分量的两组全局局部特征。Step 3: Input the low-frequency components and high-frequency components obtained in Step 2 into two discrete feature extraction modules (Sepformers) respectively for feature extraction. Each discrete feature extraction module contains two encoders (Encoder) and one decoder (Decoder), which input the corresponding components of the input to the discrete network (SeparateNetwork) in the encoder to extract global features and local features, and finally get the corresponding Two sets of global local features in two components.

步骤4：将步骤3得到的两组特征，分别在编码器后的隐藏层中进行维度对齐，再将维度对齐后的特征进行拼接，最终得到对应高低频分量的两组的全局特征和局部特征。Step 4: Align the two sets of features obtained in Step 3 in the hidden layer after the encoder, respectively, and then splicing the dimensionally aligned features, and finally obtain the two sets of global features and local features corresponding to the high and low frequency components. .

步骤5：将步骤4得到的两组特征，分别输入各自离散特征提取模块里对应的解码器(Decoder)中，通过解码器中的离散网络(Separate Network)对全局特征与各层局部特征进行重构，生成对应于高频分量和低频分量的生成预测序列。Step 5: Input the two sets of features obtained in Step 4 into the corresponding decoders (Decoders) in the respective discrete feature extraction modules, and re-replicate the global features and the local features of each layer through the discrete network (Separate Network) in the decoders. structure to generate a generative prediction sequence corresponding to the high-frequency and low-frequency components.

步骤6:对于步骤5得到的两组对应高低频分量的预测序列，通过波形重构(Waveform Reconstruction)模块进行小波分解的逆过程，对高低频分量进行重组，得到最终的生成预测序列。Step 6: For the two sets of prediction sequences corresponding to the high and low frequency components obtained in step 5, the inverse process of wavelet decomposition is carried out by the waveform reconstruction (Waveform Reconstruction) module, and the high and low frequency components are reorganized to obtain the final generated prediction sequence.

步骤7：根据步骤6得到的生成预测序列，通过均方误差(MSE)和平均绝对误差(MAE)公式，计算生成的预测序列与真实序列之间的误差，再通过Adam优化器进行反向传播，更新网络参数。Step 7: According to the generated prediction sequence obtained in step 6, calculate the error between the generated prediction sequence and the real sequence through the mean square error (MSE) and mean absolute error (MAE) formulas, and then backpropagate through the Adam optimizer , update the network parameters.

步骤8：借助于步骤7更新网络参数后的模型与步骤1得到的验证数据集，选取32组验证数据作为输入，执行步骤2至步骤7，其中将步骤2中的验证数据替换成选取的32组测试数据。最终得到基于测试数据的生成预测序列。Step 8: With the help of the model after step 7 to update the network parameters and the verification data set obtained in step 1, select 32 groups of verification data as input, and execute steps 2 to 7, wherein the verification data in step 2 is replaced by the selected 32 group test data. Finally, a generated prediction sequence based on the test data is obtained.

步骤9：计算步骤8得到的基于验证数据的生成预测序列与预测序列之间的均方误差(MSE)，求得所有组数据的均方误差(MSE)后求均值，最终得到基于验证数据集生成的预测序列。Step 9: Calculate the mean square error (MSE) between the generated prediction sequence and the prediction sequence based on the verification data obtained in step 8, obtain the mean square error (MSE) of all groups of data, and then calculate the mean value, and finally obtain the data set based on the verification data. Generated prediction sequence.

步骤10：重复步骤2至步骤9，若借助于步骤9得到的均方误差(MSE)不再减小，说明模型表现无法再变好，则网络参数更新完毕，模型结束训练。Step 10: Repeat steps 2 to 9. If the mean square error (MSE) obtained by means of step 9 no longer decreases, it means that the performance of the model cannot be improved any more, then the network parameters are updated and the model ends the training.

步骤11：将预测任务所给的输入序列输入到步骤10最终得到的训练好的模型中，进行序列预测，输出最终得到的预测序列，完成预测。Step 11: Input the input sequence given by the prediction task into the trained model finally obtained in step 10, perform sequence prediction, and output the final prediction sequence to complete the prediction.

进一步的，步骤1具体方法如下：Further, the specific method of step 1 is as follows:

选取合适的公共时间序列数据集，进行分组与分割以适应模型对数据格式的要求。首先根据需求设定每组数据中的历史序列长度、预测序列长度和起始序列长度，这三个长度分别对应每组数据中的三个部分：历史序列、预测序列和起始序列。采用滑窗机制进行分组，窗口长度为历史序列长度与预测序列长度之和，每次窗口移动一位，即相邻两组数据之间只有一位上的不同。在完成数据分组之后，截取70％组数据作为训练数据集，30％组数据作为验证数据集。Select an appropriate public time series data set, group and divide it to meet the requirements of the model for the data format. First, set the historical sequence length, predicted sequence length and starting sequence length in each set of data according to the requirements. These three lengths correspond to the three parts of each set of data: historical sequence, predicted sequence and starting sequence. The sliding window mechanism is used for grouping. The length of the window is the sum of the length of the historical sequence and the length of the predicted sequence. Each time the window moves by one bit, there is only one difference between the adjacent two groups of data. After completing the data grouping, intercept 70% of the group data as the training data set, and 30% of the group data as the validation data set.

进一步的，在长度上，起始序列长度小于等于历史序列长度，在数值上，起始序列与历史序列的后部分相同。历史序列与预测序列在位置上是前后相接的，每组数据的长度为历史序列长度与预测序列长度之和。Further, in terms of length, the length of the initial sequence is less than or equal to the length of the historical sequence, and in numerical value, the initial sequence is the same as the latter part of the historical sequence. The historical sequence and the predicted sequence are connected in position, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.

进一步的，所述的波形分解模块基于离散小波变换(Discrete WaveletTransform，DWT)原理，公式如下：Further, the described waveform decomposition module is based on the principle of discrete wavelet transform (Discrete WaveletTransform, DWT), and the formula is as follows:

subject.to.x＝0，1，2..，M-1subject.to.x=0,1,2..,M-1

j＝0，1，2，...，J-1j = 0, 1, 2, ..., J-1

k＝0，1，2，...，2^j-1k=0, 1, 2, ..., 2 ^j -1

u(x)是尺度函数(Scaling Function)，v(x)是小波函数(Wavelet Function)；W_u(0，k)和W_v(j，k)分别为近似系数(approximate coefficient)和细节系数(detailcoefficient)，二者表示低频分量和高频分量；M为序列长度；j和k用于控制尺度函数的缩放尺度。u(x) is the Scaling Function, v(x) is the Wavelet Function; W _u (0, k) and W _v (j, k) are the approximate coefficient and the detail coefficient, respectively (detailcoefficient), the two represent low-frequency components and high-frequency components; M is the sequence length; j and k are used to control the scaling scale of the scaling function.

进一步的，所述的离散网络采用波形提取模块(Waveform Extraction，WE)和离散注意力机制模块(Separate Attention，SA)层层提取全局特征(global feature)和局部特征(local feature)。波形提取模块会对输入序列进行分解，通过滑窗机制遍历整个输入序列求得窗口内均值，得到输入序列的全局趋势，使用输入序列减去得到的全局趋势，得到输入序列的局部波动。Further, the discrete network adopts a waveform extraction module (Waveform Extraction, WE) and a discrete attention mechanism module (Separate Attention, SA) layer by layer to extract global features (global features) and local features (local features). The waveform extraction module decomposes the input sequence, traverses the entire input sequence through the sliding window mechanism to obtain the average value within the window, and obtains the global trend of the input sequence, and subtracts the obtained global trend from the input sequence to obtain the local fluctuation of the input sequence.

进一步的，波形提取模块整体公式如下所示：Further, the overall formula of the waveform extraction module is as follows:

其中

和

分别表示波形的全局趋势和局部波动，用于作为输入，通过离散注意力机制模块提取全局特征和局部特征；

为第l层WE的输入序列；

为连接符号，用于连接不同的分块；AvgPool函数为均值池化函数，其设定一个滑动窗口，每次滑动一个单元，然后对窗口内的所有元素求均值，将所得数值赋值给当前单元。将进行分块，然后输入AvgPool中，

表示第i个分块。in

and

Represent the global trend and local fluctuation of the waveform, which are used as input to extract global and local features through the discrete attention mechanism module;

is the input sequence of layer 1 WE;

It is a connection symbol, which is used to connect different blocks; the AvgPool function is a mean pooling function, which sets a sliding window, slides one unit at a time, then averages all elements in the window, and assigns the obtained value to the current unit. . will be chunked and then input into the AvgPool,

Represents the ith block.

进一步的，离散注意力机制模块先将输入序列分割成长度相同的块(Block，B)，然后通过共享的注意力机制模块(Attention，AT)提取特征，接着通过前馈网络(Feed-Forward Network，FFN)进行维度变换，按比例缩短每个块的长度，最终拼接后输出。离散注意力机制(Attention，AT)的计算公式如下所示：Further, the discrete attention mechanism module first divides the input sequence into blocks of the same length (Block, B), and then extracts features through the shared attention mechanism module (Attention, AT), and then passes through the feed-forward network (Feed-Forward Network). , FFN) to perform dimension transformation, shorten the length of each block proportionally, and finally output after splicing. The calculation formula of the discrete attention mechanism (Attention, AT) is as follows:

其中，

为第l层离散注意力机制模块(SA)的输入序列；B表示输入序列得到的分块(Block)；

分别表示Q、K、V在第l层第i个分块上的可学习权重矩阵；

和

分别表示第l层Q、K、V和B的第i个分块。Q、K和V分别表示分块经过线性变换后得到的问题矩阵(query)、键值矩阵(key)和数值矩阵(value)。其中注意力机制定义为：in,

is the input sequence of the l-th layer discrete attention mechanism module (SA); B represents the block (Block) obtained by the input sequence;

respectively represent the learnable weight matrices of Q, K, and V on the i-th block of the l-th layer;

and

represent the i-th block of the l-th layer Q, K, V, and B, respectively. Q, K, and V represent the question matrix (query), the key-value matrix (key), and the value matrix (value) obtained by linear transformation of the blocks, respectively. The attention mechanism is defined as:

其中d_model表示特征维度。where d _model represents the feature dimension.

进一步的，离散网络整体函数表达式如下所示：Further, the overall function expression of the discrete network is as follows:

其中Z^l表示离散网络第l层的全局特征，H^l表示离散网络第l层的局部特征；X_SN表示SN的输入。Where Z ^l represents the global feature of the lth layer of the discrete network, H ^l represents the local feature of the lth layer of the discrete network; X _SN represents the input of SN.

本发明的有益效果：Beneficial effects of the present invention:

本发明使用基于离散小波变化的波形分解模块(Waveform Decomposition)和波形重构模块(Waveform Reconstruction)对时间序列进行分解与重构，波形分解模块将输入序列分解成低频分量和高频分量，使得两个分量的长度均为输入序列的一半，然后通过离散特征提取模块(Sepformer)进行特征提取，通过波形重构模块对得到预测的分量进行重构，生成最终预测序列。本发明大大降低了模型的规模，提高了资源利用率。The present invention uses a discrete wavelet-based waveform decomposition module (Waveform Decomposition) and a waveform reconstruction module (Waveform Reconstruction) to decompose and reconstruct the time sequence, and the waveform decomposition module decomposes the input sequence into low-frequency components and high-frequency components, so that the two The length of each component is half of the input sequence, and then feature extraction is performed by the discrete feature extraction module (Sepformer), and the predicted components are reconstructed by the waveform reconstruction module to generate the final prediction sequence. The invention greatly reduces the scale of the model and improves the resource utilization rate.

在多元时间序列预测时，预测精度、预测序列长度、对局部细微波动的拟合能力等问题都是影响预测效果的重要因素。本发明采用基于离散小波变换的波形分解和波形重构模块对输入序列进行分解，从而降低模型的规模，提高了资源利用率。采用分层平行提取多元时间序列的全局特征和局部特征机制，提升了预测精度，利用局部特征提高对多元时间序列的局部细微波动的拟合能力，并且增加了模型的预测长度，大大提升了模型在多元时间序列预测上的效果。In multivariate time series forecasting, problems such as forecasting accuracy, forecasting sequence length, and the ability to fit local subtle fluctuations are all important factors that affect the forecasting effect. The invention adopts the waveform decomposition and waveform reconstruction module based on discrete wavelet transform to decompose the input sequence, thereby reducing the scale of the model and improving the resource utilization rate. The global feature and local feature mechanism of multivariate time series are extracted in layers and parallel, which improves the prediction accuracy, uses local features to improve the fitting ability of local subtle fluctuations of multivariate time series, and increases the prediction length of the model, which greatly improves the model. Effects on multivariate time series forecasting.

附图说明Description of drawings

图1是本发明实施例的整体结构示意图。FIG. 1 is a schematic diagram of the overall structure of an embodiment of the present invention.

图2是本发明实施例的详细结构示意图。FIG. 2 is a detailed structural schematic diagram of an embodiment of the present invention.

图3是本发明实施例的离散特征提取模块(Sepformer)的结构图3 is a structural diagram of a discrete feature extraction module (Sepformer) according to an embodiment of the present invention

图4是本发明实施例的离散网络(Separate Network)的结构图。FIG. 4 is a structural diagram of a discrete network (Separate Network) according to an embodiment of the present invention.

图5是本发明实施例的离散注意力机制(Separate Attention)的结构图。FIG. 5 is a structural diagram of a discrete attention mechanism (Separate Attention) according to an embodiment of the present invention.

图6是离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)的模型图，其中微型离散波形分解方法丢弃了高频分量从而进一步降低了模型规模。FIG. 6 is a model diagram of a discrete waveform decomposition method (SWformer) and a mini-discrete waveform decomposition method (Mini-SWformer), wherein the mini-discrete waveform decomposition method discards high-frequency components to further reduce the model size.

图7是在五种公开数据集下，离散波形分解方法和微型离散波形分解方法与六个已有的方法在均方误差(MSE)上的比较。Figure 7 is a comparison of the mean square error (MSE) of the discrete waveform decomposition method and the micro discrete waveform decomposition method with six existing methods under five public datasets.

图8是相同条件下，本发明中的SWformer以及含有更小的模型规模的Mini-SWformer和Informer的GPU使用量的比较。Figure 8 is a comparison of the GPU usage of the SWformer in the present invention and the Mini-SWformer and Informer with smaller model sizes under the same conditions.

具体实施方式Detailed ways

下面结合附图和具体实施步骤对本发明做了进一步的说明：The present invention is further described below in conjunction with the accompanying drawings and specific implementation steps:

一种基于离散小波变换的轻量级时间序列预测方法，包括以下步骤：A lightweight time series forecasting method based on discrete wavelet transform, comprising the following steps:

步骤1：数据预处理。选取合适的公共时间序列数据集，进行分组与分割以适应模型对数据格式的要求。首先根据需求设定每组数据中的历史序列长度、预测序列长度和起始序列长度，这三个长度分别对应每组数据中的三个部分：历史序列、预测序列和起始序列。在长度上，起始序列长度小于等于历史序列长度，在数值上，起始序列与历史序列后部分相同。历史序列与预测序列在位置上是前后相接的，每组数据的长度为历史序列长度与预测序列长度之和。采用滑窗机制进行分组，窗口长度为历史序列长度与预测序列长度之和，每次窗口移动一位，即相邻两组数据之间只有一位上的不同。在完成数据分组之后，截取70％组数据作为训练数据集，30％组数据作为验证数据集。Step 1: Data preprocessing. Select an appropriate public time series data set, group and divide it to meet the requirements of the model for the data format. First, set the historical sequence length, predicted sequence length and starting sequence length in each set of data according to the requirements. These three lengths correspond to the three parts of each set of data: historical sequence, predicted sequence and starting sequence. In terms of length, the length of the initial sequence is less than or equal to the length of the historical sequence, and in numerical value, the initial sequence is the same as the latter part of the historical sequence. The historical sequence and the predicted sequence are connected in position, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence. The sliding window mechanism is used for grouping. The length of the window is the sum of the length of the historical sequence and the length of the predicted sequence. Each time the window moves by one bit, there is only one difference between the adjacent two groups of data. After completing the data grouping, intercept 70% of the group data as the training data set, and 30% of the group data as the validation data set.

如图1所示，展示了本发明的整体结构。数据处理与分割部分在本发明结构的入口处，负责对原始数据做初步处理，形成预测模型所需的数据结构。图2是本发明实施例的详细结构示意图。As shown in FIG. 1, the overall structure of the present invention is shown. The data processing and segmentation part is at the entrance of the structure of the present invention, and is responsible for preliminary processing of the original data to form the data structure required by the prediction model. FIG. 2 is a detailed structural schematic diagram of an embodiment of the present invention.

步骤2：借助于步骤1得到的训练数据集，在设备条件允许的情况下，每次随机选取32组训练数据，将每组数据中的历史序列和起始序列分别输入到两个波形分解(WaveformDecomposition)模块中，将输入的序列分解为低频分量(approximate coefficient)和高频分量(detail coefficient)。波形分解模块基于离散小波变换(Discrete WaveletTransform，DWT)原理，公式如下：Step 2: With the help of the training data set obtained in Step 1, 32 sets of training data are randomly selected each time when the equipment conditions allow, and the historical sequence and the starting sequence in each set of data are input into two waveform decompositions ( In the WaveformDecomposition module, the input sequence is decomposed into low-frequency components (approximate coefficient) and high-frequency components (detail coefficient). The waveform decomposition module is based on the discrete wavelet transform (Discrete Wavelet Transform, DWT) principle, the formula is as follows:

subject.to.x＝0，1，2...，M-1subject.to.x=0,1,2...,M-1

j＝0，1，2，...，J-1j = 0, 1, 2, ..., J-1

k＝0，1，2，...，2^j-1k=0, 1, 2, ..., 2 ^j -1

步骤3：将步骤2得到的低频分量和高频分量，将其分别输入到两个离散特征提取模块(Sepformer)中进行特征提取。每个离散特征提取模块中包含两个编码器(Encoder)和一个解码器(Decoder)，将输入的对应分量输入到编码器中的离散网络(SeparateNetwork)进而提取全局特征和局部特征，最终得到对应于两个分量的两组全局局部特征。Step 3: Input the low-frequency components and high-frequency components obtained in Step 2 into two discrete feature extraction modules (Sepformers) respectively for feature extraction. Each discrete feature extraction module contains two encoders (Encoder) and one decoder (Decoder), input the corresponding components of the input to the discrete network (SeparateNetwork) in the encoder to extract global features and local features, and finally get the corresponding Two sets of global local features in two components.

如图3所示，展示了本发明离散特征提取模块(Sepformer)的整体结构，离散特征提取模块(Sepformer)包含两个编码器(Encoder)和一个解码器(Decoder)。编码器和解码器的核心模块都是离散网络(Separate Network，SN)。As shown in FIG. 3 , the overall structure of the discrete feature extraction module (Sepformer) of the present invention is shown. The discrete feature extraction module (Sepformer) includes two encoders (Encoder) and one decoder (Decoder). The core modules of the encoder and decoder are discrete networks (Separate Network, SN).

如图4所示，展示了离散网络(Separate Network)的整体结构，离散网络采用波形提取模块(Waveform Extraction，WE)和离散注意力机制模块(Separate Attention，SA)层层提取全局特征(global feature)和局部特征(local feature)。波形提取模块会对输入序列进行分解，通过滑窗机制遍历整个输入序列求得窗口内均值，得到输入序列的全局趋势，使用输入序列减去得到的全局趋势，得到输入序列的局部波动。波形提取模块整体公式如下所示：As shown in Figure 4, the overall structure of the discrete network (Separate Network) is shown. The discrete network adopts the waveform extraction module (Waveform Extraction, WE) and the discrete attention mechanism module (Separate Attention, SA) layer by layer to extract global features (global feature). ) and local features. The waveform extraction module decomposes the input sequence, traverses the entire input sequence through the sliding window mechanism to obtain the average value within the window, and obtains the global trend of the input sequence, and subtracts the obtained global trend from the input sequence to obtain the local fluctuation of the input sequence. The overall formula of the waveform extraction module is as follows:

其中

和

为第l层WE的输入序列；

表示第i个分块。in

and

is the input sequence of layer 1 WE;

Represents the ith block.

如图5所示，展示了离散注意力机制模块(Separate Attention，SA)，该模块用于进行特征提取。离散注意力机制模块先将输入序列分割成长度相同的块(Block，B)，然后通过共享的注意力机制模块(Attention，AT)提取特征，接着通过前馈网络(Feed-ForwardNetwork，FFN)进行维度变换，按比例缩短每个块的长度，最终拼接后输出。离散注意力机制(Attention，AT)的计算公式如下所示：As shown in Figure 5, the discrete attention mechanism module (Separate Attention, SA) is shown, which is used for feature extraction. The discrete attention mechanism module first divides the input sequence into blocks of the same length (Block, B), and then extracts features through the shared attention mechanism module (Attention, AT), and then passes through the feed-forward network (Feed-Forward Network, FFN). Dimension transformation, reducing the length of each block proportionally, and finally outputting after splicing. The calculation formula of the discrete attention mechanism (Attention, AT) is as follows:

其中，

分别表示Q、K、V在第l层第i个分块上的可学习权重矩阵；

V_i ^l和

_Vil ^and

离散网络整体函数表达式如下所示：The overall function expression of the discrete network is as follows:

其中Z^l表示离散网络第l层的全局特征，Hl表示离散网络第l层的局部特征；X_SN表示SN的输入。Among them, Z ^l represents the global feature of the lth layer of the discrete network, Hl represents the local feature of the lth layer of the discrete network; X _SN represents the input of SN.

步骤4：借助于步骤3得到的两组特征，分别在编码器后的隐藏层中进行维度对齐，再将维度对齐后的特征进行拼接，最终得到对应高低频分量的两组的全局特征和局部特征。Step 4: With the help of the two sets of features obtained in Step 3, the dimensions are aligned in the hidden layer after the encoder, and then the dimensionally aligned features are spliced, and finally the two sets of global features and local features corresponding to the high and low frequency components are obtained. feature.

如图3所示，真实编码器(True Encoder)和预测编码器(Pred Encoder)输出的全局特征和局部特征分别进行拼接，其中真实编码器(True Encoder)输出的两种特征会经过前馈网络(Feed-Forward Network，FFN)进行纬度变换至与预测编码器(Pred Encoder)具有相同的维度，然后对两种特征各自进行拼接，得到整体的全局特征和局部特征。As shown in Figure 3, the global features and local features output by the True Encoder and the Pred Encoder are spliced separately, and the two features output by the True Encoder will go through the feedforward network. (Feed-Forward Network, FFN) performs latitude transformation to have the same dimension as the predictive encoder (Pred Encoder), and then splices the two features to obtain the overall global features and local features.

步骤7：根据步骤6得到的生成预测序列，通过均方误差(MSE)和平均绝对误差(MAE)公式，计算生成的预测序列与真实序列之间的误差，再通过Adam优化器进行反向传播，更新网络参数。均方误差(MSE)和平均绝对误差(MAE)公式如下所示：Step 7: According to the generated prediction sequence obtained in step 6, calculate the error between the generated prediction sequence and the real sequence through the mean square error (MSE) and mean absolute error (MAE) formulas, and then backpropagate through the Adam optimizer , update the network parameters. The Mean Squared Error (MSE) and Mean Absolute Error (MAE) formulas are as follows:

其中，y为预测值；

为真实值；n表示序列的长度。Among them, y is the predicted value;

is the true value; n represents the length of the sequence.

图6显示了本发明中两个方法：离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)。高频分量在时间序列数据中包含的信息量小，适当的减少高频分量可以一定程度上减少模型的计算量，从而降低模型的规模。基于这一理论基础，微型离散波形分解方法将离散波形分解方法中分解出来的高频分量及整个分支删减掉，进一步减小了模型的规模。FIG. 6 shows two methods in the present invention: a discrete waveform decomposition method (SWformer) and a mini-discrete waveform decomposition method (Mini-SWformer). The high-frequency components contain a small amount of information in the time series data. Properly reducing the high-frequency components can reduce the calculation amount of the model to a certain extent, thereby reducing the scale of the model. Based on this theoretical basis, the micro-discrete waveform decomposition method deletes the high-frequency components and the whole branch decomposed in the discrete waveform decomposition method, which further reduces the scale of the model.

图7显示了在相同的实验条件下，本发明中的两个方法和Informer、LogTrans、Reformer、LSTMa和LSTnet等七种方法在ETTh1、ETTh2、ETTm1、Weather和ECL等五种数据集上的实验结果，衡量标准为均方误差(MSE)和平方绝对值(MAE)。在每种实验条件下，表现最好的模型的实验结果在表格中加粗表示。从图6表格中可以看到离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)对比其余五种方法均有着较大的提升。对比Informer方法，离散特征提取方法的MSE平均下降了22.53％，离散波形分解方法的MSE平均下降了19.29％，微型离散波形分解方法的MSE平均下降了16.54％。Figure 7 shows the experiments of the two methods in the present invention and seven methods including Informer, LogTrans, Reformer, LSTMa and LSTnet on five datasets including ETTh1, ETTh2, ETTm1, Weather and ECL under the same experimental conditions As a result, the metrics are mean squared error (MSE) and squared absolute value (MAE). The experimental results of the best performing model under each experimental condition are shown in bold in the table. It can be seen from the table in Figure 6 that the discrete waveform decomposition method (SWformer) and the mini-discrete waveform decomposition method (Mini-SWformer) have a great improvement compared with the other five methods. Compared with the Informer method, the MSE of the discrete feature extraction method decreased by 22.53%, the MSE of the discrete waveform decomposition method decreased by 19.29%, and the MSE of the micro discrete waveform decomposition method decreased by 16.54%.

图8显示了在相同的实验条件下，随着预测序列长度的增加，离散波形分解方法(SWformer)、微型离散波形分解方法(Mini-SWformer)与Informer在内存使用量上的比较和变化。可以看到随着预测序列长度越来越长，离散波形分解方法与微型离散波形分解方法在内存使用量上的优势会越来越大。对比Informer，离散波形分解方法在内存使用量上平均降低了52.62％，微型离散波形分解方法平均降低了68.02％。Figure 8 shows the comparison and change of the memory usage of the discrete waveform decomposition method (SWformer), the mini discrete waveform decomposition method (Mini-SWformer) and the Informer as the length of the prediction sequence increases under the same experimental conditions. It can be seen that as the length of the prediction sequence becomes longer and longer, the discrete waveform decomposition method and the micro discrete waveform decomposition method have more and more advantages in memory usage. Compared with Informer, the discrete waveform decomposition method has an average reduction of 52.62% in memory usage, and the micro discrete waveform decomposition method has an average reduction of 68.02%.

Claims

1. A lightweight time series prediction method based on discrete wavelet transform is characterized by comprising the following steps:

step 1: preprocessing data to obtain a training data set and a verification data set;

step 2: with the help of the training data set obtained in the step 1, randomly selecting 32 groups of training data each time under the condition of permission of equipment conditions, respectively inputting the historical sequence and the initial sequence in each group of data into two waveform decomposition modules, and decomposing the input sequence into a low-frequency component and a high-frequency component;

and step 3: inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules respectively for feature extraction; each discrete feature extraction module comprises two encoders and a decoder, and inputs the corresponding input components into a discrete network in the encoders to extract global features and local features, and finally two groups of global and local features corresponding to the two components are obtained;

and 4, step 4: respectively carrying out dimension alignment on the two groups of characteristics obtained in the step 3 in a hidden layer after an encoder, and splicing the characteristics after dimension alignment to finally obtain two groups of global characteristics and local characteristics corresponding to high-frequency and low-frequency components;

and 5: inputting the two groups of characteristics obtained in the step 4 into corresponding decoders in respective discrete characteristic extraction modules, reconstructing global characteristics and local characteristics of each layer through a discrete network in the decoders, and generating a generation prediction sequence corresponding to high-frequency components and low-frequency components;

step 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a waveform reconstruction module, and recombining the high and low frequency components to obtain a final generated prediction sequence;

and 7: calculating the error between the generated prediction sequence and the real sequence according to the generated prediction sequence obtained in the step 6 through a Mean Square Error (MSE) and a Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters;

and 8: selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step 7 and the verification data set obtained in the step 1, and executing the steps 2 to 7, wherein the verification data in the step 2 is replaced by the selected 32 groups of test data; finally, a generated prediction sequence based on the test data is obtained;

and step 9: calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step 8, calculating the Mean Square Error (MSE) of all the groups of data, and finally obtaining the prediction sequence generated based on the verification data set;

step 10: repeating the step 2 to the step 9, if the mean square error MSE obtained by the step 9 is not reduced any more, which indicates that the model performance can not be improved any more, the network parameters are updated, and the model finishes training;

step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.

2. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein the specific method in step 1 is as follows:

selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format; firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: a historical sequence, a predicted sequence, and a starting sequence; grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data; after completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.

3. The discrete wavelet transform-based lightweight time series prediction method of claim 2, wherein in length, the starting sequence length is less than or equal to the historical sequence length, and in value, the starting sequence is the same as the latter part of the historical sequence; the historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.

4. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein said waveform decomposition module is based on the principle of discrete wavelet transform, and the formula is as follows:

subject.to.x＝0，1，2...，M-1

j＝0，1，2，...，J-1

k＝0，1，2，...，2^j-1

u (x) is a scale function, and v (x) is a wavelet function; w_u(0, k) and W_v(j, k) approximation coefficients and detail coefficients, respectively, representing low frequency components and high frequency components; m is the sequence length; j and k are used to control the scaling of the scaling function.

5. The discrete wavelet transform-based lightweight time series prediction method as claimed in claim 1, wherein the discrete network adopts a waveform extraction module and a discrete attention mechanism module to extract global features and local features layer by layer; the waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence.

6. The discrete wavelet transform-based lightweight time series prediction method according to claim 5, wherein the overall formula of the waveform extraction module is as follows:

wherein

And

respectively representing the global trend and the local fluctuation of the waveform, and extracting global features and local features through a discrete attention mechanism module as input;

is an input sequence of the first layer WE;

is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function, a sliding window is set, a unit slides each time, then the mean value of all elements in the window is calculated, and the obtained value is assigned to the current unit; will be blocked and then input into AvgPool,

representing the ith block.

7. The discrete wavelet transform-based lightweight time series prediction method is characterized in that a discrete Attention mechanism module firstly divides an input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (Attention, AT), then performs dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block in proportion, and finally splices and outputs the blocks; the calculation formula of the discrete Attention mechanism (AT) is as follows:

wherein,

an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;

q, K, V are respectively represented on ith block of the l layer;

and

ith partitions representing ith layers Q, K, V and B, respectively; q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the block is subjected to linear transformation; wherein the attention mechanism is defined as:

wherein d is_modelRepresenting a feature dimension.

8. The discrete wavelet transform-based lightweight time series prediction method according to claim 7, wherein the discrete network integral function expression is as follows:

wherein Z^lRepresenting global features of the l-th layer of the discrete network, H^lLocal features representing the l-th layer of the discrete network; x_SNRepresenting the input of SN.