CN114219027A - Lightweight time series prediction method based on discrete wavelet transform - Google Patents
Lightweight time series prediction method based on discrete wavelet transform Download PDFInfo
- Publication number
- CN114219027A CN114219027A CN202111536500.3A CN202111536500A CN114219027A CN 114219027 A CN114219027 A CN 114219027A CN 202111536500 A CN202111536500 A CN 202111536500A CN 114219027 A CN114219027 A CN 114219027A
- Authority
- CN
- China
- Prior art keywords
- sequence
- discrete
- data
- length
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于时间序列预测领域,具体涉及一种基于离散小波变换的轻量级时间序列预测方法。The invention belongs to the field of time series prediction, in particular to a lightweight time series prediction method based on discrete wavelet transform.
背景技术Background technique
近年来,时间预测技术广泛地用在设备健康预测系统、天气预测、股票预测等各个领域。时间序列预测是时间序列分析领域的一个重要分支,通常时间序列预测方法会对历史中的时间序列不断学习与分析,从而提取出决定该时间序列变化的特征,在该特征的基础上,对未来一段时间内的时间序列变化趋势进行预测。In recent years, time forecasting technology has been widely used in various fields such as equipment health forecasting systems, weather forecasting, and stock forecasting. Time series forecasting is an important branch in the field of time series analysis. Usually, the time series forecasting method will continuously learn and analyze the time series in history, so as to extract the characteristics that determine the changes of the time series. Predict the trend of time series changes over a period of time.
随着对时间序列预测问题研究的不断深入,以及各种优秀方法的不断出现,使得时间序列预测问题对新方法的要求不断增高,表现在对预测精度等更高要求、预测序列长度的增加、单变量时间序列向多变量时间序列的转变、要求模型规模尽可能缩小从而使其得到更广泛的应用,等等。With the continuous deepening of research on time series forecasting problems and the continuous emergence of various excellent methods, the requirements for new methods in time series forecasting problems are constantly increasing, which is manifested in higher requirements for forecasting accuracy, increase in forecasting sequence length, The transformation of univariate time series to multivariate time series, the requirement to reduce the size of the model as much as possible to make it more widely applicable, etc.
近年来,越来越多的时间序列预测方法专注于提升预测精度、增长预测序列长度。随着时间序列预测问题的要求逐渐增高,众多方法在学习时间序列中的长距离依赖关系问题上越来越乏力,难以取得进一步的突破。直到基于注意力机制(Attention,AT)的Transformer方法的提出,一个新的强大的模块带来了新的视野,得益于其在提取距离较长的两个元素之间依赖关系问题上有着突破性的提高。越来越多的方法中将Transformer方法用于时间序列预测问题上,取得了很好的进展。但Transformer有着较高的计算复杂度,模型规模庞大,使得它对内存具有很高的要求,从而无法直接用于更长的预测要求。于是,越来越多用于改善Transformer的计算复杂度的Transformer变体模型被提出,使其在更长时间序列预测中取得更好的效果。在众多变体模型中,离散特征提取方法(Sepformer)有着相当大的提升。In recent years, more and more time series forecasting methods focus on improving forecasting accuracy and increasing the length of forecasting sequences. With the increasing requirements of time series forecasting problems, many methods are getting weaker and weaker in learning long-distance dependencies in time series, and it is difficult to make further breakthroughs. Until the Transformer method based on the attention mechanism (Attention, AT) was proposed, a new powerful module brought a new vision, thanks to its breakthrough in extracting the dependency between two elements with a long distance. Sexual improvement. In more and more methods, the Transformer method is used in time series forecasting problems, and good progress has been made. However, Transformer has a high computational complexity and a large model size, which makes it highly memory-intensive and cannot be directly used for longer prediction requirements. As a result, more and more Transformer variant models are proposed to improve the computational complexity of Transformer, so that they can achieve better results in longer-term series prediction. Among the many variant models, the discrete feature extraction method (Sepformer) has a considerable improvement.
离散特征提取方法(Sepformer)采用分层平行提取全局特征和局部特征的离散网络(Separate Network),从而提升了整个模型的精度。离散网络(Separate Network)针对注意力(Self-attention)机制的高计算复杂度的缺点,采用了离散注意力(SeparateAttention)机制进行分块计算attention数值,从而降低了模型的计算复杂度至O(C)。该方法可以提高多元时间序列预测的精度、对比已存在的方法降低了计算复杂度以及增加最大预测长度。但该方法依旧具有较大的模型规模,资源利用率较低。The discrete feature extraction method (Sepformer) uses a discrete network (Separate Network) that extracts global features and local features in layers and parallel, thereby improving the accuracy of the entire model. In view of the shortcomings of the high computational complexity of the attention (Self-attention) mechanism, the discrete network (Separate Network) adopts the discrete attention (SeparateAttention) mechanism to calculate the attention value in blocks, thereby reducing the computational complexity of the model to O( C). This method can improve the accuracy of multivariate time series forecasting, reduce the computational complexity and increase the maximum forecast length compared with existing methods. However, this method still has a large model scale and low resource utilization.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题是在保证预测精度的前提下,尽量降低模型占用内存规模,使得模型在各项技术问题上达到一个平衡(trade-off)。本发明提供一种基于离散小波变换的轻量级时间序列预测方法,经过测试后,在极大程度上保留了离散特征提取方法的高精度、低计算复杂度以及长序列预测能力,并且进一步减小模型规模,提高了资源利用率。The technical problem to be solved by the present invention is to reduce the memory size occupied by the model as much as possible on the premise of ensuring the prediction accuracy, so that the model can achieve a trade-off on various technical problems. The present invention provides a lightweight time series prediction method based on discrete wavelet transform. After testing, the high precision, low computational complexity and long sequence prediction ability of the discrete feature extraction method are retained to a great extent, and the Small model scale improves resource utilization.
本发明采用的技术方案是:采用波形分解模块,对输入序列进行分解得到低频分量和高频分量,使得两个分量的长度均为输入序列的一半,然后采用离散特征提取方法(Sepformer)来对两个分量分别预测,离散特征提取方法基于分层平行提取特征的离散网络(Separate Network)。离散网络(Separate Network)针对注意力(Self-attention)机制的高计算复杂度的缺点,采用了离散注意力(SeparateAttention)机制进行分块计算attention数值,从而降低了模型的计算复杂度。最后采用波形重构模块生成最终预测序列。该方法可以提高资源利用率,更小的模型规模使得其在资源受限制的设备上更具竞争力。The technical scheme adopted in the present invention is: adopting a waveform decomposition module to decompose the input sequence to obtain low-frequency components and high-frequency components, so that the lengths of the two components are both half of the input sequence, and then adopting a discrete feature extraction method (Sepformer) to The two components are predicted separately, and the discrete feature extraction method is based on the discrete network (Separate Network) of hierarchical parallel feature extraction. In view of the shortcomings of the high computational complexity of the Self-attention mechanism, the Discrete Network (Separate Network) adopts the Discrete Attention (Separate Attention) mechanism to calculate the attention value in blocks, thereby reducing the computational complexity of the model. Finally, the waveform reconstruction module is used to generate the final prediction sequence. This approach can improve resource utilization, and the smaller model size makes it more competitive on resource-constrained devices.
一种基于离散小波变换的轻量级时间序列预测方法,步骤如下:A lightweight time series prediction method based on discrete wavelet transform, the steps are as follows:
步骤1:数据预处理,获得训练数据集和验证数据集。Step 1: Data preprocessing to obtain training datasets and validation datasets.
步骤2:借助于步骤1得到的训练数据集,在设备条件允许的情况下,每次随机选取32组训练数据,将每组数据中的历史序列和起始序列分别输入到两个波形分解(WaveformDecomposition)模块中,将输入的序列分解为低频分量(approximate coefficient)和高频分量(detail coefficient)。Step 2: With the help of the training data set obtained in Step 1, 32 sets of training data are randomly selected each time when the equipment conditions allow, and the historical sequence and the starting sequence in each set of data are input into two waveform decompositions ( In the WaveformDecomposition module, the input sequence is decomposed into low-frequency components (approximate coefficient) and high-frequency components (detail coefficient).
步骤3:将步骤2得到的低频分量和高频分量,将其分别输入到两个离散特征提取模块(Sepformer)中进行特征提取。每个离散特征提取模块中包含两个编码器(Encoder)和一个解码器(Decoder),将输入的对应分量输入到编码器中的离散网络(SeparateNetwork)进而提取全局特征和局部特征,最终得到对应于两个分量的两组全局局部特征。Step 3: Input the low-frequency components and high-frequency components obtained in Step 2 into two discrete feature extraction modules (Sepformers) respectively for feature extraction. Each discrete feature extraction module contains two encoders (Encoder) and one decoder (Decoder), which input the corresponding components of the input to the discrete network (SeparateNetwork) in the encoder to extract global features and local features, and finally get the corresponding Two sets of global local features in two components.
步骤4:将步骤3得到的两组特征,分别在编码器后的隐藏层中进行维度对齐,再将维度对齐后的特征进行拼接,最终得到对应高低频分量的两组的全局特征和局部特征。Step 4: Align the two sets of features obtained in
步骤5:将步骤4得到的两组特征,分别输入各自离散特征提取模块里对应的解码器(Decoder)中,通过解码器中的离散网络(Separate Network)对全局特征与各层局部特征进行重构,生成对应于高频分量和低频分量的生成预测序列。Step 5: Input the two sets of features obtained in Step 4 into the corresponding decoders (Decoders) in the respective discrete feature extraction modules, and re-replicate the global features and the local features of each layer through the discrete network (Separate Network) in the decoders. structure to generate a generative prediction sequence corresponding to the high-frequency and low-frequency components.
步骤6:对于步骤5得到的两组对应高低频分量的预测序列,通过波形重构(Waveform Reconstruction)模块进行小波分解的逆过程,对高低频分量进行重组,得到最终的生成预测序列。Step 6: For the two sets of prediction sequences corresponding to the high and low frequency components obtained in step 5, the inverse process of wavelet decomposition is carried out by the waveform reconstruction (Waveform Reconstruction) module, and the high and low frequency components are reorganized to obtain the final generated prediction sequence.
步骤7:根据步骤6得到的生成预测序列,通过均方误差(MSE)和平均绝对误差(MAE)公式,计算生成的预测序列与真实序列之间的误差,再通过Adam优化器进行反向传播,更新网络参数。Step 7: According to the generated prediction sequence obtained in step 6, calculate the error between the generated prediction sequence and the real sequence through the mean square error (MSE) and mean absolute error (MAE) formulas, and then backpropagate through the Adam optimizer , update the network parameters.
步骤8:借助于步骤7更新网络参数后的模型与步骤1得到的验证数据集,选取32组验证数据作为输入,执行步骤2至步骤7,其中将步骤2中的验证数据替换成选取的32组测试数据。最终得到基于测试数据的生成预测序列。Step 8: With the help of the model after step 7 to update the network parameters and the verification data set obtained in step 1, select 32 groups of verification data as input, and execute steps 2 to 7, wherein the verification data in step 2 is replaced by the selected 32 group test data. Finally, a generated prediction sequence based on the test data is obtained.
步骤9:计算步骤8得到的基于验证数据的生成预测序列与预测序列之间的均方误差(MSE),求得所有组数据的均方误差(MSE)后求均值,最终得到基于验证数据集生成的预测序列。Step 9: Calculate the mean square error (MSE) between the generated prediction sequence and the prediction sequence based on the verification data obtained in step 8, obtain the mean square error (MSE) of all groups of data, and then calculate the mean value, and finally obtain the data set based on the verification data. Generated prediction sequence.
步骤10:重复步骤2至步骤9,若借助于步骤9得到的均方误差(MSE)不再减小,说明模型表现无法再变好,则网络参数更新完毕,模型结束训练。Step 10: Repeat steps 2 to 9. If the mean square error (MSE) obtained by means of step 9 no longer decreases, it means that the performance of the model cannot be improved any more, then the network parameters are updated and the model ends the training.
步骤11:将预测任务所给的输入序列输入到步骤10最终得到的训练好的模型中,进行序列预测,输出最终得到的预测序列,完成预测。Step 11: Input the input sequence given by the prediction task into the trained model finally obtained in step 10, perform sequence prediction, and output the final prediction sequence to complete the prediction.
进一步的,步骤1具体方法如下:Further, the specific method of step 1 is as follows:
选取合适的公共时间序列数据集,进行分组与分割以适应模型对数据格式的要求。首先根据需求设定每组数据中的历史序列长度、预测序列长度和起始序列长度,这三个长度分别对应每组数据中的三个部分:历史序列、预测序列和起始序列。采用滑窗机制进行分组,窗口长度为历史序列长度与预测序列长度之和,每次窗口移动一位,即相邻两组数据之间只有一位上的不同。在完成数据分组之后,截取70%组数据作为训练数据集,30%组数据作为验证数据集。Select an appropriate public time series data set, group and divide it to meet the requirements of the model for the data format. First, set the historical sequence length, predicted sequence length and starting sequence length in each set of data according to the requirements. These three lengths correspond to the three parts of each set of data: historical sequence, predicted sequence and starting sequence. The sliding window mechanism is used for grouping. The length of the window is the sum of the length of the historical sequence and the length of the predicted sequence. Each time the window moves by one bit, there is only one difference between the adjacent two groups of data. After completing the data grouping, intercept 70% of the group data as the training data set, and 30% of the group data as the validation data set.
进一步的,在长度上,起始序列长度小于等于历史序列长度,在数值上,起始序列与历史序列的后部分相同。历史序列与预测序列在位置上是前后相接的,每组数据的长度为历史序列长度与预测序列长度之和。Further, in terms of length, the length of the initial sequence is less than or equal to the length of the historical sequence, and in numerical value, the initial sequence is the same as the latter part of the historical sequence. The historical sequence and the predicted sequence are connected in position, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.
进一步的,所述的波形分解模块基于离散小波变换(Discrete WaveletTransform,DWT)原理,公式如下:Further, the described waveform decomposition module is based on the principle of discrete wavelet transform (Discrete WaveletTransform, DWT), and the formula is as follows:
subject.to.x=0,1,2..,M-1subject.to.x=0,1,2..,M-1
j=0,1,2,...,J-1j = 0, 1, 2, ..., J-1
k=0,1,2,...,2j-1k=0, 1, 2, ..., 2 j -1
u(x)是尺度函数(Scaling Function),v(x)是小波函数(Wavelet Function);Wu(0,k)和Wv(j,k)分别为近似系数(approximate coefficient)和细节系数(detailcoefficient),二者表示低频分量和高频分量;M为序列长度;j和k用于控制尺度函数的缩放尺度。u(x) is the Scaling Function, v(x) is the Wavelet Function; W u (0, k) and W v (j, k) are the approximate coefficient and the detail coefficient, respectively (detailcoefficient), the two represent low-frequency components and high-frequency components; M is the sequence length; j and k are used to control the scaling scale of the scaling function.
进一步的,所述的离散网络采用波形提取模块(Waveform Extraction,WE)和离散注意力机制模块(Separate Attention,SA)层层提取全局特征(global feature)和局部特征(local feature)。波形提取模块会对输入序列进行分解,通过滑窗机制遍历整个输入序列求得窗口内均值,得到输入序列的全局趋势,使用输入序列减去得到的全局趋势,得到输入序列的局部波动。Further, the discrete network adopts a waveform extraction module (Waveform Extraction, WE) and a discrete attention mechanism module (Separate Attention, SA) layer by layer to extract global features (global features) and local features (local features). The waveform extraction module decomposes the input sequence, traverses the entire input sequence through the sliding window mechanism to obtain the average value within the window, and obtains the global trend of the input sequence, and subtracts the obtained global trend from the input sequence to obtain the local fluctuation of the input sequence.
进一步的,波形提取模块整体公式如下所示:Further, the overall formula of the waveform extraction module is as follows:
其中和分别表示波形的全局趋势和局部波动,用于作为输入,通过离散注意力机制模块提取全局特征和局部特征;为第l层WE的输入序列;为连接符号,用于连接不同的分块;AvgPool函数为均值池化函数,其设定一个滑动窗口,每次滑动一个单元,然后对窗口内的所有元素求均值,将所得数值赋值给当前单元。将进行分块,然后输入AvgPool中,表示第i个分块。in and Represent the global trend and local fluctuation of the waveform, which are used as input to extract global and local features through the discrete attention mechanism module; is the input sequence of layer 1 WE; It is a connection symbol, which is used to connect different blocks; the AvgPool function is a mean pooling function, which sets a sliding window, slides one unit at a time, then averages all elements in the window, and assigns the obtained value to the current unit. . will be chunked and then input into the AvgPool, Represents the ith block.
进一步的,离散注意力机制模块先将输入序列分割成长度相同的块(Block,B),然后通过共享的注意力机制模块(Attention,AT)提取特征,接着通过前馈网络(Feed-Forward Network,FFN)进行维度变换,按比例缩短每个块的长度,最终拼接后输出。离散注意力机制(Attention,AT)的计算公式如下所示:Further, the discrete attention mechanism module first divides the input sequence into blocks of the same length (Block, B), and then extracts features through the shared attention mechanism module (Attention, AT), and then passes through the feed-forward network (Feed-Forward Network). , FFN) to perform dimension transformation, shorten the length of each block proportionally, and finally output after splicing. The calculation formula of the discrete attention mechanism (Attention, AT) is as follows:
其中,为第l层离散注意力机制模块(SA)的输入序列;B表示输入序列得到的分块(Block);分别表示Q、K、V在第l层第i个分块上的可学习权重矩阵;和分别表示第l层Q、K、V和B的第i个分块。Q、K和V分别表示分块经过线性变换后得到的问题矩阵(query)、键值矩阵(key)和数值矩阵(value)。其中注意力机制定义为:in, is the input sequence of the l-th layer discrete attention mechanism module (SA); B represents the block (Block) obtained by the input sequence; respectively represent the learnable weight matrices of Q, K, and V on the i-th block of the l-th layer; and represent the i-th block of the l-th layer Q, K, V, and B, respectively. Q, K, and V represent the question matrix (query), the key-value matrix (key), and the value matrix (value) obtained by linear transformation of the blocks, respectively. The attention mechanism is defined as:
其中dmodel表示特征维度。where d model represents the feature dimension.
进一步的,离散网络整体函数表达式如下所示:Further, the overall function expression of the discrete network is as follows:
其中Zl表示离散网络第l层的全局特征,Hl表示离散网络第l层的局部特征;XSN表示SN的输入。Where Z l represents the global feature of the lth layer of the discrete network, H l represents the local feature of the lth layer of the discrete network; X SN represents the input of SN.
本发明的有益效果:Beneficial effects of the present invention:
本发明使用基于离散小波变化的波形分解模块(Waveform Decomposition)和波形重构模块(Waveform Reconstruction)对时间序列进行分解与重构,波形分解模块将输入序列分解成低频分量和高频分量,使得两个分量的长度均为输入序列的一半,然后通过离散特征提取模块(Sepformer)进行特征提取,通过波形重构模块对得到预测的分量进行重构,生成最终预测序列。本发明大大降低了模型的规模,提高了资源利用率。The present invention uses a discrete wavelet-based waveform decomposition module (Waveform Decomposition) and a waveform reconstruction module (Waveform Reconstruction) to decompose and reconstruct the time sequence, and the waveform decomposition module decomposes the input sequence into low-frequency components and high-frequency components, so that the two The length of each component is half of the input sequence, and then feature extraction is performed by the discrete feature extraction module (Sepformer), and the predicted components are reconstructed by the waveform reconstruction module to generate the final prediction sequence. The invention greatly reduces the scale of the model and improves the resource utilization rate.
在多元时间序列预测时,预测精度、预测序列长度、对局部细微波动的拟合能力等问题都是影响预测效果的重要因素。本发明采用基于离散小波变换的波形分解和波形重构模块对输入序列进行分解,从而降低模型的规模,提高了资源利用率。采用分层平行提取多元时间序列的全局特征和局部特征机制,提升了预测精度,利用局部特征提高对多元时间序列的局部细微波动的拟合能力,并且增加了模型的预测长度,大大提升了模型在多元时间序列预测上的效果。In multivariate time series forecasting, problems such as forecasting accuracy, forecasting sequence length, and the ability to fit local subtle fluctuations are all important factors that affect the forecasting effect. The invention adopts the waveform decomposition and waveform reconstruction module based on discrete wavelet transform to decompose the input sequence, thereby reducing the scale of the model and improving the resource utilization rate. The global feature and local feature mechanism of multivariate time series are extracted in layers and parallel, which improves the prediction accuracy, uses local features to improve the fitting ability of local subtle fluctuations of multivariate time series, and increases the prediction length of the model, which greatly improves the model. Effects on multivariate time series forecasting.
附图说明Description of drawings
图1是本发明实施例的整体结构示意图。FIG. 1 is a schematic diagram of the overall structure of an embodiment of the present invention.
图2是本发明实施例的详细结构示意图。FIG. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
图3是本发明实施例的离散特征提取模块(Sepformer)的结构图3 is a structural diagram of a discrete feature extraction module (Sepformer) according to an embodiment of the present invention
图4是本发明实施例的离散网络(Separate Network)的结构图。FIG. 4 is a structural diagram of a discrete network (Separate Network) according to an embodiment of the present invention.
图5是本发明实施例的离散注意力机制(Separate Attention)的结构图。FIG. 5 is a structural diagram of a discrete attention mechanism (Separate Attention) according to an embodiment of the present invention.
图6是离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)的模型图,其中微型离散波形分解方法丢弃了高频分量从而进一步降低了模型规模。FIG. 6 is a model diagram of a discrete waveform decomposition method (SWformer) and a mini-discrete waveform decomposition method (Mini-SWformer), wherein the mini-discrete waveform decomposition method discards high-frequency components to further reduce the model size.
图7是在五种公开数据集下,离散波形分解方法和微型离散波形分解方法与六个已有的方法在均方误差(MSE)上的比较。Figure 7 is a comparison of the mean square error (MSE) of the discrete waveform decomposition method and the micro discrete waveform decomposition method with six existing methods under five public datasets.
图8是相同条件下,本发明中的SWformer以及含有更小的模型规模的Mini-SWformer和Informer的GPU使用量的比较。Figure 8 is a comparison of the GPU usage of the SWformer in the present invention and the Mini-SWformer and Informer with smaller model sizes under the same conditions.
具体实施方式Detailed ways
下面结合附图和具体实施步骤对本发明做了进一步的说明:The present invention is further described below in conjunction with the accompanying drawings and specific implementation steps:
一种基于离散小波变换的轻量级时间序列预测方法,包括以下步骤:A lightweight time series forecasting method based on discrete wavelet transform, comprising the following steps:
步骤1:数据预处理。选取合适的公共时间序列数据集,进行分组与分割以适应模型对数据格式的要求。首先根据需求设定每组数据中的历史序列长度、预测序列长度和起始序列长度,这三个长度分别对应每组数据中的三个部分:历史序列、预测序列和起始序列。在长度上,起始序列长度小于等于历史序列长度,在数值上,起始序列与历史序列后部分相同。历史序列与预测序列在位置上是前后相接的,每组数据的长度为历史序列长度与预测序列长度之和。采用滑窗机制进行分组,窗口长度为历史序列长度与预测序列长度之和,每次窗口移动一位,即相邻两组数据之间只有一位上的不同。在完成数据分组之后,截取70%组数据作为训练数据集,30%组数据作为验证数据集。Step 1: Data preprocessing. Select an appropriate public time series data set, group and divide it to meet the requirements of the model for the data format. First, set the historical sequence length, predicted sequence length and starting sequence length in each set of data according to the requirements. These three lengths correspond to the three parts of each set of data: historical sequence, predicted sequence and starting sequence. In terms of length, the length of the initial sequence is less than or equal to the length of the historical sequence, and in numerical value, the initial sequence is the same as the latter part of the historical sequence. The historical sequence and the predicted sequence are connected in position, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence. The sliding window mechanism is used for grouping. The length of the window is the sum of the length of the historical sequence and the length of the predicted sequence. Each time the window moves by one bit, there is only one difference between the adjacent two groups of data. After completing the data grouping, intercept 70% of the group data as the training data set, and 30% of the group data as the validation data set.
如图1所示,展示了本发明的整体结构。数据处理与分割部分在本发明结构的入口处,负责对原始数据做初步处理,形成预测模型所需的数据结构。图2是本发明实施例的详细结构示意图。As shown in FIG. 1, the overall structure of the present invention is shown. The data processing and segmentation part is at the entrance of the structure of the present invention, and is responsible for preliminary processing of the original data to form the data structure required by the prediction model. FIG. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
步骤2:借助于步骤1得到的训练数据集,在设备条件允许的情况下,每次随机选取32组训练数据,将每组数据中的历史序列和起始序列分别输入到两个波形分解(WaveformDecomposition)模块中,将输入的序列分解为低频分量(approximate coefficient)和高频分量(detail coefficient)。波形分解模块基于离散小波变换(Discrete WaveletTransform,DWT)原理,公式如下:Step 2: With the help of the training data set obtained in Step 1, 32 sets of training data are randomly selected each time when the equipment conditions allow, and the historical sequence and the starting sequence in each set of data are input into two waveform decompositions ( In the WaveformDecomposition module, the input sequence is decomposed into low-frequency components (approximate coefficient) and high-frequency components (detail coefficient). The waveform decomposition module is based on the discrete wavelet transform (Discrete Wavelet Transform, DWT) principle, the formula is as follows:
subject.to.x=0,1,2...,M-1subject.to.x=0,1,2...,M-1
j=0,1,2,...,J-1j = 0, 1, 2, ..., J-1
k=0,1,2,...,2j-1k=0, 1, 2, ..., 2 j -1
u(x)是尺度函数(Scaling Function),v(x)是小波函数(Wavelet Function);Wu(0,k)和Wv(j,k)分别为近似系数(approximate coefficient)和细节系数(detailcoefficient),二者表示低频分量和高频分量;M为序列长度;j和k用于控制尺度函数的缩放尺度。u(x) is the Scaling Function, v(x) is the Wavelet Function; W u (0, k) and W v (j, k) are the approximate coefficient and the detail coefficient, respectively (detailcoefficient), the two represent low-frequency components and high-frequency components; M is the sequence length; j and k are used to control the scaling scale of the scaling function.
步骤3:将步骤2得到的低频分量和高频分量,将其分别输入到两个离散特征提取模块(Sepformer)中进行特征提取。每个离散特征提取模块中包含两个编码器(Encoder)和一个解码器(Decoder),将输入的对应分量输入到编码器中的离散网络(SeparateNetwork)进而提取全局特征和局部特征,最终得到对应于两个分量的两组全局局部特征。Step 3: Input the low-frequency components and high-frequency components obtained in Step 2 into two discrete feature extraction modules (Sepformers) respectively for feature extraction. Each discrete feature extraction module contains two encoders (Encoder) and one decoder (Decoder), input the corresponding components of the input to the discrete network (SeparateNetwork) in the encoder to extract global features and local features, and finally get the corresponding Two sets of global local features in two components.
如图3所示,展示了本发明离散特征提取模块(Sepformer)的整体结构,离散特征提取模块(Sepformer)包含两个编码器(Encoder)和一个解码器(Decoder)。编码器和解码器的核心模块都是离散网络(Separate Network,SN)。As shown in FIG. 3 , the overall structure of the discrete feature extraction module (Sepformer) of the present invention is shown. The discrete feature extraction module (Sepformer) includes two encoders (Encoder) and one decoder (Decoder). The core modules of the encoder and decoder are discrete networks (Separate Network, SN).
如图4所示,展示了离散网络(Separate Network)的整体结构,离散网络采用波形提取模块(Waveform Extraction,WE)和离散注意力机制模块(Separate Attention,SA)层层提取全局特征(global feature)和局部特征(local feature)。波形提取模块会对输入序列进行分解,通过滑窗机制遍历整个输入序列求得窗口内均值,得到输入序列的全局趋势,使用输入序列减去得到的全局趋势,得到输入序列的局部波动。波形提取模块整体公式如下所示:As shown in Figure 4, the overall structure of the discrete network (Separate Network) is shown. The discrete network adopts the waveform extraction module (Waveform Extraction, WE) and the discrete attention mechanism module (Separate Attention, SA) layer by layer to extract global features (global feature). ) and local features. The waveform extraction module decomposes the input sequence, traverses the entire input sequence through the sliding window mechanism to obtain the average value within the window, and obtains the global trend of the input sequence, and subtracts the obtained global trend from the input sequence to obtain the local fluctuation of the input sequence. The overall formula of the waveform extraction module is as follows:
其中和分别表示波形的全局趋势和局部波动,用于作为输入,通过离散注意力机制模块提取全局特征和局部特征;为第l层WE的输入序列;为连接符号,用于连接不同的分块;AvgPool函数为均值池化函数,其设定一个滑动窗口,每次滑动一个单元,然后对窗口内的所有元素求均值,将所得数值赋值给当前单元。将进行分块,然后输入AvgPool中,表示第i个分块。in and Represent the global trend and local fluctuation of the waveform, which are used as input to extract global and local features through the discrete attention mechanism module; is the input sequence of layer 1 WE; It is a connection symbol, which is used to connect different blocks; the AvgPool function is a mean pooling function, which sets a sliding window, slides one unit at a time, then averages all elements in the window, and assigns the obtained value to the current unit. . will be chunked and then input into the AvgPool, Represents the ith block.
如图5所示,展示了离散注意力机制模块(Separate Attention,SA),该模块用于进行特征提取。离散注意力机制模块先将输入序列分割成长度相同的块(Block,B),然后通过共享的注意力机制模块(Attention,AT)提取特征,接着通过前馈网络(Feed-ForwardNetwork,FFN)进行维度变换,按比例缩短每个块的长度,最终拼接后输出。离散注意力机制(Attention,AT)的计算公式如下所示:As shown in Figure 5, the discrete attention mechanism module (Separate Attention, SA) is shown, which is used for feature extraction. The discrete attention mechanism module first divides the input sequence into blocks of the same length (Block, B), and then extracts features through the shared attention mechanism module (Attention, AT), and then passes through the feed-forward network (Feed-Forward Network, FFN). Dimension transformation, reducing the length of each block proportionally, and finally outputting after splicing. The calculation formula of the discrete attention mechanism (Attention, AT) is as follows:
其中,为第l层离散注意力机制模块(SA)的输入序列;B表示输入序列得到的分块(Block);分别表示Q、K、V在第l层第i个分块上的可学习权重矩阵;Vi l和分别表示第l层Q、K、V和B的第i个分块。Q、K和V分别表示分块经过线性变换后得到的问题矩阵(query)、键值矩阵(key)和数值矩阵(value)。其中注意力机制定义为:in, is the input sequence of the l-th layer discrete attention mechanism module (SA); B represents the block (Block) obtained by the input sequence; respectively represent the learnable weight matrices of Q, K, and V on the i-th block of the l-th layer; Vil and represent the i-th block of the l-th layer Q, K, V, and B, respectively. Q, K, and V represent the question matrix (query), the key-value matrix (key), and the value matrix (value) obtained by linear transformation of the blocks, respectively. The attention mechanism is defined as:
其中dmodel表示特征维度。where d model represents the feature dimension.
离散网络整体函数表达式如下所示:The overall function expression of the discrete network is as follows:
其中Zl表示离散网络第l层的全局特征,Hl表示离散网络第l层的局部特征;XSN表示SN的输入。Among them, Z l represents the global feature of the lth layer of the discrete network, Hl represents the local feature of the lth layer of the discrete network; X SN represents the input of SN.
步骤4:借助于步骤3得到的两组特征,分别在编码器后的隐藏层中进行维度对齐,再将维度对齐后的特征进行拼接,最终得到对应高低频分量的两组的全局特征和局部特征。Step 4: With the help of the two sets of features obtained in
如图3所示,真实编码器(True Encoder)和预测编码器(Pred Encoder)输出的全局特征和局部特征分别进行拼接,其中真实编码器(True Encoder)输出的两种特征会经过前馈网络(Feed-Forward Network,FFN)进行纬度变换至与预测编码器(Pred Encoder)具有相同的维度,然后对两种特征各自进行拼接,得到整体的全局特征和局部特征。As shown in Figure 3, the global features and local features output by the True Encoder and the Pred Encoder are spliced separately, and the two features output by the True Encoder will go through the feedforward network. (Feed-Forward Network, FFN) performs latitude transformation to have the same dimension as the predictive encoder (Pred Encoder), and then splices the two features to obtain the overall global features and local features.
步骤5:将步骤4得到的两组特征,分别输入各自离散特征提取模块里对应的解码器(Decoder)中,通过解码器中的离散网络(Separate Network)对全局特征与各层局部特征进行重构,生成对应于高频分量和低频分量的生成预测序列。Step 5: Input the two sets of features obtained in Step 4 into the corresponding decoders (Decoders) in the respective discrete feature extraction modules, and re-replicate the global features and the local features of each layer through the discrete network (Separate Network) in the decoders. structure to generate a generative prediction sequence corresponding to the high-frequency and low-frequency components.
步骤6:对于步骤5得到的两组对应高低频分量的预测序列,通过波形重构(Waveform Reconstruction)模块进行小波分解的逆过程,对高低频分量进行重组,得到最终的生成预测序列。Step 6: For the two sets of prediction sequences corresponding to the high and low frequency components obtained in step 5, the inverse process of wavelet decomposition is carried out by the waveform reconstruction (Waveform Reconstruction) module, and the high and low frequency components are reorganized to obtain the final generated prediction sequence.
步骤7:根据步骤6得到的生成预测序列,通过均方误差(MSE)和平均绝对误差(MAE)公式,计算生成的预测序列与真实序列之间的误差,再通过Adam优化器进行反向传播,更新网络参数。均方误差(MSE)和平均绝对误差(MAE)公式如下所示:Step 7: According to the generated prediction sequence obtained in step 6, calculate the error between the generated prediction sequence and the real sequence through the mean square error (MSE) and mean absolute error (MAE) formulas, and then backpropagate through the Adam optimizer , update the network parameters. The Mean Squared Error (MSE) and Mean Absolute Error (MAE) formulas are as follows:
其中,y为预测值;为真实值;n表示序列的长度。Among them, y is the predicted value; is the true value; n represents the length of the sequence.
步骤8:借助于步骤7更新网络参数后的模型与步骤1得到的验证数据集,选取32组验证数据作为输入,执行步骤2至步骤7,其中将步骤2中的验证数据替换成选取的32组测试数据。最终得到基于测试数据的生成预测序列。Step 8: With the help of the model after step 7 to update the network parameters and the verification data set obtained in step 1, select 32 groups of verification data as input, and execute steps 2 to 7, wherein the verification data in step 2 is replaced by the selected 32 group test data. Finally, a generated prediction sequence based on the test data is obtained.
步骤9:计算步骤8得到的基于验证数据的生成预测序列与预测序列之间的均方误差(MSE),求得所有组数据的均方误差(MSE)后求均值,最终得到基于验证数据集生成的预测序列。Step 9: Calculate the mean square error (MSE) between the generated prediction sequence and the prediction sequence based on the verification data obtained in step 8, obtain the mean square error (MSE) of all groups of data, and then calculate the mean value, and finally obtain the data set based on the verification data. Generated prediction sequence.
步骤10:重复步骤2至步骤9,若借助于步骤9得到的均方误差(MSE)不再减小,说明模型表现无法再变好,则网络参数更新完毕,模型结束训练。Step 10: Repeat steps 2 to 9. If the mean square error (MSE) obtained by means of step 9 no longer decreases, it means that the performance of the model cannot be improved any more, then the network parameters are updated and the model ends the training.
步骤11:将预测任务所给的输入序列输入到步骤10最终得到的训练好的模型中,进行序列预测,输出最终得到的预测序列,完成预测。Step 11: Input the input sequence given by the prediction task into the trained model finally obtained in step 10, perform sequence prediction, and output the final prediction sequence to complete the prediction.
图6显示了本发明中两个方法:离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)。高频分量在时间序列数据中包含的信息量小,适当的减少高频分量可以一定程度上减少模型的计算量,从而降低模型的规模。基于这一理论基础,微型离散波形分解方法将离散波形分解方法中分解出来的高频分量及整个分支删减掉,进一步减小了模型的规模。FIG. 6 shows two methods in the present invention: a discrete waveform decomposition method (SWformer) and a mini-discrete waveform decomposition method (Mini-SWformer). The high-frequency components contain a small amount of information in the time series data. Properly reducing the high-frequency components can reduce the calculation amount of the model to a certain extent, thereby reducing the scale of the model. Based on this theoretical basis, the micro-discrete waveform decomposition method deletes the high-frequency components and the whole branch decomposed in the discrete waveform decomposition method, which further reduces the scale of the model.
图7显示了在相同的实验条件下,本发明中的两个方法和Informer、LogTrans、Reformer、LSTMa和LSTnet等七种方法在ETTh1、ETTh2、ETTm1、Weather和ECL等五种数据集上的实验结果,衡量标准为均方误差(MSE)和平方绝对值(MAE)。在每种实验条件下,表现最好的模型的实验结果在表格中加粗表示。从图6表格中可以看到离散波形分解方法(SWformer)和微型离散波形分解方法(Mini-SWformer)对比其余五种方法均有着较大的提升。对比Informer方法,离散特征提取方法的MSE平均下降了22.53%,离散波形分解方法的MSE平均下降了19.29%,微型离散波形分解方法的MSE平均下降了16.54%。Figure 7 shows the experiments of the two methods in the present invention and seven methods including Informer, LogTrans, Reformer, LSTMa and LSTnet on five datasets including ETTh1, ETTh2, ETTm1, Weather and ECL under the same experimental conditions As a result, the metrics are mean squared error (MSE) and squared absolute value (MAE). The experimental results of the best performing model under each experimental condition are shown in bold in the table. It can be seen from the table in Figure 6 that the discrete waveform decomposition method (SWformer) and the mini-discrete waveform decomposition method (Mini-SWformer) have a great improvement compared with the other five methods. Compared with the Informer method, the MSE of the discrete feature extraction method decreased by 22.53%, the MSE of the discrete waveform decomposition method decreased by 19.29%, and the MSE of the micro discrete waveform decomposition method decreased by 16.54%.
图8显示了在相同的实验条件下,随着预测序列长度的增加,离散波形分解方法(SWformer)、微型离散波形分解方法(Mini-SWformer)与Informer在内存使用量上的比较和变化。可以看到随着预测序列长度越来越长,离散波形分解方法与微型离散波形分解方法在内存使用量上的优势会越来越大。对比Informer,离散波形分解方法在内存使用量上平均降低了52.62%,微型离散波形分解方法平均降低了68.02%。Figure 8 shows the comparison and change of the memory usage of the discrete waveform decomposition method (SWformer), the mini discrete waveform decomposition method (Mini-SWformer) and the Informer as the length of the prediction sequence increases under the same experimental conditions. It can be seen that as the length of the prediction sequence becomes longer and longer, the discrete waveform decomposition method and the micro discrete waveform decomposition method have more and more advantages in memory usage. Compared with Informer, the discrete waveform decomposition method has an average reduction of 52.62% in memory usage, and the micro discrete waveform decomposition method has an average reduction of 68.02%.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536500.3A CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536500.3A CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114219027A true CN114219027A (en) | 2022-03-22 |
Family
ID=80702457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111536500.3A Pending CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114219027A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114345A (en) * | 2022-04-02 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115293244A (en) * | 2022-07-15 | 2022-11-04 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
CN118364286A (en) * | 2024-04-24 | 2024-07-19 | 山东省人工智能研究院 | A lightweight cloud server load prediction method based on wavelet decomposition and multi-head external attention mechanism |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980074795A (en) * | 1997-03-27 | 1998-11-05 | 윤종용 | Image Coding Method Using Global-Based Color Correlation |
CN110826803A (en) * | 2019-11-06 | 2020-02-21 | 广东电力交易中心有限责任公司 | Electricity price prediction method and device for electric power spot market |
CN112862875A (en) * | 2021-01-18 | 2021-05-28 | 中国科学院自动化研究所 | Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism |
-
2021
- 2021-12-15 CN CN202111536500.3A patent/CN114219027A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980074795A (en) * | 1997-03-27 | 1998-11-05 | 윤종용 | Image Coding Method Using Global-Based Color Correlation |
CN110826803A (en) * | 2019-11-06 | 2020-02-21 | 广东电力交易中心有限责任公司 | Electricity price prediction method and device for electric power spot market |
CN112862875A (en) * | 2021-01-18 | 2021-05-28 | 中国科学院自动化研究所 | Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114345A (en) * | 2022-04-02 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115114345B (en) * | 2022-04-02 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115293244A (en) * | 2022-07-15 | 2022-11-04 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
CN115293244B (en) * | 2022-07-15 | 2023-08-15 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
CN118364286A (en) * | 2024-04-24 | 2024-07-19 | 山东省人工智能研究院 | A lightweight cloud server load prediction method based on wavelet decomposition and multi-head external attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114219027A (en) | Lightweight time series prediction method based on discrete wavelet transform | |
CN113420868A (en) | Traveling salesman problem solving method and system based on deep reinforcement learning | |
CN114239718A (en) | High-precision long-term time sequence prediction method based on multivariate time sequence data analysis | |
CN113347422B (en) | Coarse-grained context entropy coding method | |
CN107292446B (en) | Hybrid wind speed prediction method based on component relevance wavelet decomposition | |
CN110309603A (en) | A short-term wind speed prediction method and system based on wind speed characteristics | |
CN112863180A (en) | Traffic speed prediction method, device, electronic equipment and computer readable medium | |
CN114970774A (en) | Intelligent transformer fault prediction method and device | |
CN117852686A (en) | Power load prediction method based on multi-element self-encoder | |
Huai et al. | Zerobn: Learning compact neural networks for latency-critical edge systems | |
CN117690289B (en) | A traffic network encoding representation learning method based on masked graph attention mechanism | |
Chen et al. | An efficient sharing grouped convolution via bayesian learning | |
CN118657253A (en) | Multivariate time series long-term prediction method based on multi-scale time series feature enhancement | |
CN113761777A (en) | Ultra-short-term photovoltaic power prediction method based on HP-OVMD | |
CN117033987A (en) | Wind farm power generation efficiency prediction method based on wavelet | |
CN118657255A (en) | A deep learning work order quantity prediction method based on wavelet multi-resolution decomposition | |
Daems et al. | Variational inference for SDEs driven by fractional noise | |
CN105956252A (en) | Generative deep belief network-based multi-scale forecast modeling method for ultra-short-term wind speed | |
CN116108735A (en) | Spatial-temporal high-resolution reconstruction method for fluid data with unknown boundary and initial conditions | |
CN115631624B (en) | Improved complementary integrated empirical mode decomposition based on spatiotemporal convolution short-term traffic flow prediction method | |
CN115834914B (en) | A tensor network-based entropy coding, entropy decoding method and image compression method | |
CN113949880B (en) | Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method | |
Le et al. | Hierarchical autoencoder-based lossy compression for large-scale high-resolution scientific data | |
CN116630448A (en) | Image compression method based on neural data dependent transformation of window attention | |
CN116522099A (en) | Time series data self-supervised pre-training model, construction method, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |