CN114219027A - Lightweight time series prediction method based on discrete wavelet transform - Google Patents
Lightweight time series prediction method based on discrete wavelet transform Download PDFInfo
- Publication number
- CN114219027A CN114219027A CN202111536500.3A CN202111536500A CN114219027A CN 114219027 A CN114219027 A CN 114219027A CN 202111536500 A CN202111536500 A CN 202111536500A CN 114219027 A CN114219027 A CN 114219027A
- Authority
- CN
- China
- Prior art keywords
- sequence
- discrete
- prediction
- data
- length
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000007246 mechanism Effects 0.000 claims abstract description 36
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 34
- 238000012795 verification Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 8
- 238000005192 partition Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 230000002860 competitive effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a lightweight time series prediction method based on discrete wavelet transform, which adopts a waveform decomposition module to decompose an input sequence to obtain a low-frequency component and a high-frequency component, so that the lengths of the two components are half of the input sequence, and then adopts a discrete feature extraction method based on a discrete network for extracting features in a layered parallel manner to predict the two components respectively; aiming at the defect of high computational complexity of the attention mechanism, the discrete network adopts the discrete attention mechanism to calculate the attention value in a blocking way, thereby reducing the computational complexity of the model. And finally, generating a final prediction sequence by adopting a waveform reconstruction module. The method can improve the resource utilization rate, and the smaller model size makes the method more competitive on the equipment with limited resources.
Description
Technical Field
The invention belongs to the field of time series prediction, and particularly relates to a lightweight time series prediction method based on discrete wavelet transform.
Background
In recent years, time prediction technology has been widely used in various fields such as equipment health prediction systems, weather prediction, stock prediction, and the like. Time series prediction is an important branch in the field of time series analysis, and generally, a time series prediction method continuously learns and analyzes a time series in history so as to extract a characteristic determining the change of the time series, and on the basis of the characteristic, the time series change trend in a period of time in the future is predicted.
With the continuous and deep research on the time series prediction problem and the continuous emergence of various excellent methods, the requirements of the time series prediction problem on new methods are continuously increased, and the requirements are represented by higher requirements on prediction precision and the like, the increase of the length of a prediction sequence, the transformation from a univariate time series to a multivariate time series, the requirement on the scale reduction of a model as far as possible so as to enable the model to be widely applied, and the like.
In recent years, more and more time series prediction methods have been focused on improving prediction accuracy and increasing prediction series length. With the increasing requirements of the time series prediction problem, many methods are increasingly weak in learning the long-distance dependence problem in the time series, and further breakthrough is difficult to achieve. Until the introduction of the Attention-based (AT) transform approach, a new powerful module brought a new field of view, thanks to its breakthrough improvement in the problem of dependency between two elements with long extraction distances. The Transformer method is used for the time series prediction problem in more and more methods, and good progress is made. However, the Transformer has high computational complexity and a large model size, so that the Transformer has high requirements on the memory and cannot be directly used for longer prediction requirements. Thus, more and more Transformer variant models for improving the computational complexity of the Transformer are proposed, which make it possible to achieve better results in longer-time sequence prediction. Among the numerous variant models, the discrete feature extraction method (Sepformer) is considerably enhanced.
The discrete feature extraction method (separator) adopts a discrete Network (separator Network) for extracting global features and local features in a layered and parallel manner, so that the precision of the whole model is improved. Aiming at the defect of high computational complexity of a Self-attention (Self-attention) mechanism, a discrete attention (SeparateAttention) mechanism is adopted to calculate attention values in a blocking mode, so that the computational complexity of a model is reduced to O (C). The method can improve the accuracy of multivariate time series prediction, reduce the computational complexity and increase the maximum prediction length by comparing the existing methods. But the method still has larger model scale and lower resource utilization rate.
Disclosure of Invention
The invention aims to solve the technical problem that the memory occupation scale of the model is reduced as much as possible on the premise of ensuring the prediction precision, so that the model achieves a balance (trade-off) on various technical problems. The invention provides a lightweight time series prediction method based on discrete wavelet transform, which retains the high precision, low computation complexity and long series prediction capability of a discrete feature extraction method to the greatest extent after testing, further reduces the model scale and improves the resource utilization rate.
The technical scheme adopted by the invention is as follows: the method comprises the steps of decomposing an input sequence by a waveform decomposition module to obtain a low-frequency component and a high-frequency component, enabling the lengths of the two components to be half of the length of the input sequence, and then respectively predicting the two components by a discrete feature extraction method (separator) based on a discrete Network (separator Network) for extracting features in a layered parallel mode. Aiming at the defect of high computation complexity of a Self-attention (Self-attention) mechanism, a discrete attention (SeparateAttention) mechanism is adopted to calculate attention values in a blocking mode, and therefore computation complexity of a model is reduced. And finally, generating a final prediction sequence by adopting a waveform reconstruction module. The method can improve the resource utilization rate, and the smaller model size makes the method more competitive on the equipment with limited resources.
A lightweight time series prediction method based on discrete wavelet transform comprises the following steps:
step 1: and preprocessing the data to obtain a training data set and a verification data set.
Step 2: with the aid of the training data set obtained in step 1, 32 sets of training data are randomly selected each time when the device conditions allow, the history sequence and the start sequence in each set of data are respectively input into two Waveform Decomposition (Waveform Decomposition) modules, and the input sequence is decomposed into a low frequency component (approximate coefficient) and a high frequency component (detail coefficient).
And step 3: and (3) respectively inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules (Sepormers) for feature extraction. Each discrete feature extraction module comprises two encoders (encoders) and a Decoder (decoders), and corresponding input components are input into a discrete Network (Separate Network) in the encoders to extract global features and local features, so that two groups of global and local features corresponding to the two components are obtained finally.
And 4, step 4: and (3) respectively carrying out dimension alignment on the two groups of features obtained in the step (3) in a hidden layer after an encoder, and splicing the features after dimension alignment to finally obtain two groups of global features and local features corresponding to high-frequency and low-frequency components.
And 5: and (4) respectively inputting the two groups of characteristics obtained in the step (4) into corresponding decoders (decoders) in respective discrete characteristic extraction modules, and reconstructing the global characteristics and the local characteristics of each layer through a discrete Network (separator Network) in the decoders to generate a generated prediction sequence corresponding to the high-frequency component and the low-frequency component.
And 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a Waveform Reconstruction (Waveform Reconstruction) module, and recombining the high and low frequency components to obtain a final generated prediction sequence.
And 7: and (4) according to the generated prediction sequence obtained in the step (6), calculating the error between the generated prediction sequence and the real sequence through a Mean Square Error (MSE) and Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters.
And 8: and (4) selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step (7) and the verification data set obtained in the step (1), and executing the steps (2) to (7), wherein the verification data in the step (2) is replaced by the selected 32 groups of test data. And finally, generating a prediction sequence based on the test data.
And step 9: and (4) calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step (8), calculating the Mean Square Error (MSE) of all the groups of data, and then calculating the average value to finally obtain the prediction sequence generated based on the verification data set.
Step 10: and (5) repeating the step (2) to the step (9), if the Mean Square Error (MSE) obtained by the step (9) is not reduced any more, which indicates that the model performance cannot be improved any more, the network parameters are updated, and the model finishes training.
Step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
Further, the specific method in step 1 is as follows:
and selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format. Firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: historical sequence, predicted sequence, and starting sequence. And grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data. After completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
Further, the starting sequence length is less than or equal to the history sequence length in length, and the starting sequence is identical to the rear part of the history sequence in value. The historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.
Further, the waveform decomposition module is based on the principle of Discrete Wavelet Transform (DWT), and the formula is as follows:
subject.to.x=0,1,2..,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a Scaling Function, v (x) is a Wavelet Function; wu(0, k) and Wv(j, k) an approximation coefficient (approximate coefficient) and a detail coefficient (detail coefficient), respectively, which represent the low frequency component and the high frequency component; m is the sequence length; j and k are used to control the scaling of the scaling function.
Further, the discrete network adopts a Waveform Extraction module (WE) and a discrete Attention mechanism module (SA) to extract global features (global features) and local features (local features) layer by layer. The waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence.
Further, the overall formula of the waveform extraction module is as follows:
whereinAndrespectively representing the global trend and the local fluctuation of the waveform, and extracting global features and local features through a discrete attention mechanism module as input;is an input sequence of the first layer WE;is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function that sets a sliding window, slides one cell at a time, then averages all elements within the window, and assigns the resulting value to the current cell. Will be blocked and then input into AvgPool,representing the ith block.
Furthermore, the discrete Attention mechanism module firstly divides the input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (AT), then performs dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block in proportion, and finally splices and outputs the blocks. The calculation formula of the discrete Attention mechanism (AT) is as follows:
wherein,an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;q, K, V are respectively represented on ith block of the l layer;andrepresenting the ith partitions of the ith layers Q, K, V and B, respectively. Q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the blocks are subjected to linear transformation. Wherein the attention mechanism is defined as:
wherein d ismodelRepresenting a feature dimension.
Further, the discrete network integral function expression is as follows:
wherein ZlRepresenting global features of the l-th layer of the discrete network, HlLocal features representing the l-th layer of the discrete network; xSNRepresenting the input of SN.
The invention has the beneficial effects that:
according to the method, a Waveform Decomposition module (Waveform Decomposition) and a Waveform Reconstruction module (Waveform Reconstruction) based on discrete wavelet change are used for decomposing and reconstructing a time sequence, the Waveform Decomposition module decomposes an input sequence into a low-frequency component and a high-frequency component, the lengths of the two components are half of the length of the input sequence, then a discrete feature extraction module (Sepormer) is used for carrying out feature extraction, and the Waveform Reconstruction module is used for reconstructing the predicted components to generate a final predicted sequence. The invention greatly reduces the scale of the model and improves the resource utilization rate.
In the multivariate time series prediction, the problems of prediction precision, prediction sequence length, fitting ability to local fine fluctuation and the like are all important factors influencing the prediction effect. The invention adopts the waveform decomposition and waveform reconstruction module based on discrete wavelet transform to decompose the input sequence, thereby reducing the scale of the model and improving the resource utilization rate. By adopting a mechanism of extracting global features and local features of the multivariate time sequence in a layered parallel manner, the prediction precision is improved, the fitting capability of the local fine fluctuation of the multivariate time sequence is improved by utilizing the local features, the prediction length of the model is increased, and the effect of the model on the prediction of the multivariate time sequence is greatly improved.
Drawings
Fig. 1 is a schematic view of the overall structure of the embodiment of the present invention.
Fig. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
FIG. 3 is a diagram of a discrete feature extraction module (Sepormer) according to an embodiment of the present invention
Fig. 4 is a structural diagram of a discrete Network (Separate Network) according to an embodiment of the present invention.
Fig. 5 is a block diagram of a discrete Attention mechanism (separator attachment) of an embodiment of the present invention.
FIG. 6 is a model diagram of the discrete waveform decomposition method (SWformer) and the Mini-discrete waveform decomposition method (Mini-SWformer) that discards high frequency components to further reduce the model size.
Fig. 7 is a comparison of the discrete waveform decomposition method and the miniature discrete waveform decomposition method with six existing methods in terms of Mean Square Error (MSE) under five public data sets.
FIG. 8 is a comparison of the GPU usage of SWformer of the present invention and Mini-SWformer and Informmer with smaller model sizes under the same conditions.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific implementation steps:
a lightweight time series prediction method based on discrete wavelet transform comprises the following steps:
step 1: and (4) preprocessing data. And selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format. Firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: historical sequence, predicted sequence, and starting sequence. In length, the starting sequence length is less than or equal to the history sequence length, and in value, the starting sequence is the same as the latter part of the history sequence. The historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence. And grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data. After completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
As shown in fig. 1, the overall structure of the present invention is shown. The data processing and dividing part is arranged at the entrance of the structure of the invention and is responsible for carrying out primary processing on the original data to form a data structure required by a prediction model. Fig. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
Step 2: with the aid of the training data set obtained in step 1, 32 sets of training data are randomly selected each time when the device conditions allow, the history sequence and the start sequence in each set of data are respectively input into two Waveform Decomposition (Waveform Decomposition) modules, and the input sequence is decomposed into a low frequency component (approximate coefficient) and a high frequency component (detail coefficient). The waveform decomposition module is based on Discrete Wavelet Transform (DWT) principle, and the formula is as follows:
subject.to.x=0,1,2...,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a Scaling Function, v (x) is a Wavelet Function; wu(0, k) and Wv(j, k) an approximation coefficient (approximate coefficient) and a detail coefficient (detail coefficient), respectively, which represent the low frequency component and the high frequency component; m is the sequence length; j and k are used to control the scaling of the scaling function.
And step 3: and (3) respectively inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules (Sepormers) for feature extraction. Each discrete feature extraction module comprises two encoders (encoders) and a Decoder (decoders), and corresponding input components are input into a discrete Network (Separate Network) in the encoders to extract global features and local features, so that two groups of global and local features corresponding to the two components are obtained finally.
As shown in fig. 3, the overall structure of the discrete feature extraction module (Sepformer) of the present invention is shown, and the discrete feature extraction module (Sepformer) includes two encoders (encoders) and one Decoder (decoders). The core modules of the encoder and the decoder are discrete networks (SN).
As shown in fig. 4, an overall structure of a discrete Network (discrete Network) is shown, and the discrete Network adopts a Waveform Extraction module (WE) and a discrete Attention mechanism module (SA) to extract global features (global features) and local features (local features) layer by layer. The waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence. The overall formula of the waveform extraction module is as follows:
whereinAndrespectively representing global trends and local fluctuations of the waveform for extracting, as input, the global trend and the local fluctuation of the waveform by means of a discrete attention mechanism moduleFeatures and local features;is an input sequence of the first layer WE;is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function that sets a sliding window, slides one cell at a time, then averages all elements within the window, and assigns the resulting value to the current cell. Will be blocked and then input into AvgPool,representing the ith block.
As shown in fig. 5, a discrete Attention mechanism module (SA) is shown, which is used for feature extraction. The discrete Attention mechanism module firstly divides an input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (AT), then carries out dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block according to a proportion, and finally splices and outputs the blocks. The calculation formula of the discrete Attention mechanism (AT) is as follows:
wherein,an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;individual watchQ, K, V learnable weight matrix on ith partition of layer l;Vi landrepresenting the ith partitions of the ith layers Q, K, V and B, respectively. Q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the blocks are subjected to linear transformation. Wherein the attention mechanism is defined as:
wherein d ismodelRepresenting a feature dimension.
The discrete network overall function expression is as follows:
wherein ZlRepresenting global characteristics of the ith layer of the discrete network, and Hl representing local characteristics of the ith layer of the discrete network; xSNRepresenting the input of SN.
And 4, step 4: and (3) respectively carrying out dimension alignment in the hidden layer after the encoder by means of the two groups of features obtained in the step (3), and then splicing the features after the dimension alignment to finally obtain two groups of global features and local features corresponding to the high-frequency and low-frequency components.
As shown in fig. 3, the global features and the local features output by the True Encoder (True Encoder) and the prediction Encoder (Pred Encoder) are respectively spliced, wherein the two features output by the True Encoder (True Encoder) are subjected to latitude transformation to have the same dimension as that of the prediction Encoder (Pred Encoder) through a Feed-Forward Network (FFN), and then the two features are respectively spliced to obtain the global features and the local features.
And 5: and (4) respectively inputting the two groups of characteristics obtained in the step (4) into corresponding decoders (decoders) in respective discrete characteristic extraction modules, and reconstructing the global characteristics and the local characteristics of each layer through a discrete Network (separator Network) in the decoders to generate a generated prediction sequence corresponding to the high-frequency component and the low-frequency component.
And 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a Waveform Reconstruction (Waveform Reconstruction) module, and recombining the high and low frequency components to obtain a final generated prediction sequence.
And 7: and (4) according to the generated prediction sequence obtained in the step (6), calculating the error between the generated prediction sequence and the real sequence through a Mean Square Error (MSE) and Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters. The Mean Square Error (MSE) and Mean Absolute Error (MAE) equations are as follows:
And 8: and (4) selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step (7) and the verification data set obtained in the step (1), and executing the steps (2) to (7), wherein the verification data in the step (2) is replaced by the selected 32 groups of test data. And finally, generating a prediction sequence based on the test data.
And step 9: and (4) calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step (8), calculating the Mean Square Error (MSE) of all the groups of data, and then calculating the average value to finally obtain the prediction sequence generated based on the verification data set.
Step 10: and (5) repeating the step (2) to the step (9), if the Mean Square Error (MSE) obtained by the step (9) is not reduced any more, which indicates that the model performance cannot be improved any more, the network parameters are updated, and the model finishes training.
Step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
Fig. 6 shows two methods of the present invention: discrete waveform decomposition method (SWformer) and Mini-discrete waveform decomposition method (Mini-SWformer). The amount of information contained in the time-series data by the high-frequency components is small, and appropriate reduction of the high-frequency components can reduce the amount of calculation of the model to some extent, thereby reducing the scale of the model. Based on the theoretical basis, the miniature discrete waveform decomposition method deletes the high-frequency components and the whole branch decomposed in the discrete waveform decomposition method, thereby further reducing the scale of the model.
FIG. 7 shows the results of the two methods of the present invention and seven methods such as Informmer, LogTrans, Reformer, LSTMa, and LSTnet under the same experimental conditions on five data sets such as ETTh1, ETTh2, ETTm1, Weather, and ECL, with the measures of Mean Square Error (MSE) and square absolute value (MAE). The results of the experiments for the best performing model under each experimental condition are shown in bold in the table. From the table of fig. 6, it can be seen that the discrete waveform decomposition method (SWformer) and the Mini discrete waveform decomposition method (Mini-SWformer) are greatly improved compared with the other five methods. Compared with the Informmer method, the MSE of the discrete feature extraction method is reduced by 22.53 percent on average, the MSE of the discrete waveform decomposition method is reduced by 19.29 percent on average, and the MSE of the micro discrete waveform decomposition method is reduced by 16.54 percent on average.
FIG. 8 shows the comparison and variation of the discrete waveform decomposition method (SWformer), Mini-discrete waveform decomposition method (Mini-SWformer) and Informmer in memory usage with increasing predicted sequence length under the same experimental conditions. It can be seen that the advantage of the discrete waveform decomposition method and the micro discrete waveform decomposition method in terms of memory usage is greater and greater as the length of the prediction sequence is longer and longer. Compared with the Informer, the discrete waveform decomposition method has the average reduction of 52.62 percent on the memory usage amount, and the micro discrete waveform decomposition method has the average reduction of 68.02 percent.
Claims (8)
1. A lightweight time series prediction method based on discrete wavelet transform is characterized by comprising the following steps:
step 1: preprocessing data to obtain a training data set and a verification data set;
step 2: with the help of the training data set obtained in the step 1, randomly selecting 32 groups of training data each time under the condition of permission of equipment conditions, respectively inputting the historical sequence and the initial sequence in each group of data into two waveform decomposition modules, and decomposing the input sequence into a low-frequency component and a high-frequency component;
and step 3: inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules respectively for feature extraction; each discrete feature extraction module comprises two encoders and a decoder, and inputs the corresponding input components into a discrete network in the encoders to extract global features and local features, and finally two groups of global and local features corresponding to the two components are obtained;
and 4, step 4: respectively carrying out dimension alignment on the two groups of characteristics obtained in the step 3 in a hidden layer after an encoder, and splicing the characteristics after dimension alignment to finally obtain two groups of global characteristics and local characteristics corresponding to high-frequency and low-frequency components;
and 5: inputting the two groups of characteristics obtained in the step 4 into corresponding decoders in respective discrete characteristic extraction modules, reconstructing global characteristics and local characteristics of each layer through a discrete network in the decoders, and generating a generation prediction sequence corresponding to high-frequency components and low-frequency components;
step 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a waveform reconstruction module, and recombining the high and low frequency components to obtain a final generated prediction sequence;
and 7: calculating the error between the generated prediction sequence and the real sequence according to the generated prediction sequence obtained in the step 6 through a Mean Square Error (MSE) and a Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters;
and 8: selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step 7 and the verification data set obtained in the step 1, and executing the steps 2 to 7, wherein the verification data in the step 2 is replaced by the selected 32 groups of test data; finally, a generated prediction sequence based on the test data is obtained;
and step 9: calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step 8, calculating the Mean Square Error (MSE) of all the groups of data, and finally obtaining the prediction sequence generated based on the verification data set;
step 10: repeating the step 2 to the step 9, if the mean square error MSE obtained by the step 9 is not reduced any more, which indicates that the model performance can not be improved any more, the network parameters are updated, and the model finishes training;
step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
2. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein the specific method in step 1 is as follows:
selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format; firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: a historical sequence, a predicted sequence, and a starting sequence; grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data; after completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
3. The discrete wavelet transform-based lightweight time series prediction method of claim 2, wherein in length, the starting sequence length is less than or equal to the historical sequence length, and in value, the starting sequence is the same as the latter part of the historical sequence; the historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.
4. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein said waveform decomposition module is based on the principle of discrete wavelet transform, and the formula is as follows:
subject.to.x=0,1,2...,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a scale function, and v (x) is a wavelet function; wu(0, k) and Wv(j, k) approximation coefficients and detail coefficients, respectively, representing low frequency components and high frequency components; m is the sequence length; j and k are used to control the scaling of the scaling function.
5. The discrete wavelet transform-based lightweight time series prediction method as claimed in claim 1, wherein the discrete network adopts a waveform extraction module and a discrete attention mechanism module to extract global features and local features layer by layer; the waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence.
6. The discrete wavelet transform-based lightweight time series prediction method according to claim 5, wherein the overall formula of the waveform extraction module is as follows:
whereinAndrespectively representing the global trend and the local fluctuation of the waveform, and extracting global features and local features through a discrete attention mechanism module as input;is an input sequence of the first layer WE;is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function, a sliding window is set, a unit slides each time, then the mean value of all elements in the window is calculated, and the obtained value is assigned to the current unit; will be blocked and then input into AvgPool,representing the ith block.
7. The discrete wavelet transform-based lightweight time series prediction method is characterized in that a discrete Attention mechanism module firstly divides an input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (Attention, AT), then performs dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block in proportion, and finally splices and outputs the blocks; the calculation formula of the discrete Attention mechanism (AT) is as follows:
wherein,an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;q, K, V are respectively represented on ith block of the l layer;andith partitions representing ith layers Q, K, V and B, respectively; q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the block is subjected to linear transformation; wherein the attention mechanism is defined as:
wherein d ismodelRepresenting a feature dimension.
8. The discrete wavelet transform-based lightweight time series prediction method according to claim 7, wherein the discrete network integral function expression is as follows:
wherein ZlRepresenting global features of the l-th layer of the discrete network, HlLocal features representing the l-th layer of the discrete network; xSNRepresenting the input of SN.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536500.3A CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536500.3A CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114219027A true CN114219027A (en) | 2022-03-22 |
Family
ID=80702457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111536500.3A Pending CN114219027A (en) | 2021-12-15 | 2021-12-15 | Lightweight time series prediction method based on discrete wavelet transform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114219027A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114345A (en) * | 2022-04-02 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115293244A (en) * | 2022-07-15 | 2022-11-04 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980074795A (en) * | 1997-03-27 | 1998-11-05 | 윤종용 | Image Coding Method Using Global-Based Color Correlation |
CN110826803A (en) * | 2019-11-06 | 2020-02-21 | 广东电力交易中心有限责任公司 | Electricity price prediction method and device for electric power spot market |
CN112862875A (en) * | 2021-01-18 | 2021-05-28 | 中国科学院自动化研究所 | Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism |
-
2021
- 2021-12-15 CN CN202111536500.3A patent/CN114219027A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980074795A (en) * | 1997-03-27 | 1998-11-05 | 윤종용 | Image Coding Method Using Global-Based Color Correlation |
CN110826803A (en) * | 2019-11-06 | 2020-02-21 | 广东电力交易中心有限责任公司 | Electricity price prediction method and device for electric power spot market |
CN112862875A (en) * | 2021-01-18 | 2021-05-28 | 中国科学院自动化研究所 | Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114345A (en) * | 2022-04-02 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115114345B (en) * | 2022-04-02 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Feature representation extraction method, device, equipment, storage medium and program product |
CN115293244A (en) * | 2022-07-15 | 2022-11-04 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
CN115293244B (en) * | 2022-07-15 | 2023-08-15 | 北京航空航天大学 | Smart grid false data injection attack detection method based on signal processing and data reduction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109214575B (en) | Ultrashort-term wind power prediction method based on small-wavelength short-term memory network | |
CN112863180B (en) | Traffic speed prediction method, device, electronic equipment and computer readable medium | |
CN114239718B (en) | High-precision long-term time sequence prediction method based on multi-element time sequence data analysis | |
CN112364975A (en) | Terminal operation state prediction method and system based on graph neural network | |
CN114219027A (en) | Lightweight time series prediction method based on discrete wavelet transform | |
CN107292446B (en) | Hybrid wind speed prediction method based on component relevance wavelet decomposition | |
CN113747163B (en) | Image coding and decoding method and compression method based on context recombination modeling | |
CN112767959B (en) | Voice enhancement method, device, equipment and medium | |
CN116362398A (en) | Power load prediction method based on modal decomposition and reconstruction and LSTM-MLR hybrid model | |
CN112434891A (en) | Method for predicting solar irradiance time sequence based on WCNN-ALSTM | |
CN109583588B (en) | Short-term wind speed prediction method and system | |
CN117709556B (en) | Photovoltaic power generation short-term prediction method, system, medium and equipment | |
CN111141879B (en) | Deep learning air quality monitoring method, device and equipment | |
Kozat et al. | Universal switching linear least squares prediction | |
CN108491958B (en) | Short-time bus passenger flow chord invariant prediction method | |
CN117852686A (en) | Power load prediction method based on multi-element self-encoder | |
CN102299766B (en) | Joint optimization method for dimensionality reduction and quantification of communication signal for object state estimation | |
CN115713155A (en) | Traffic sequence prediction method based on multivariate time sequence data analysis | |
CN113949880A (en) | Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method | |
Sun et al. | KSVD-based multiple description image coding | |
CN110705044A (en) | Simulation verification method for simulating wind power generation system based on deep learning | |
CN111476408A (en) | Power communication equipment state prediction method and system | |
CN112446516A (en) | Travel prediction method and device | |
Bai et al. | A NN-GM (1, 1) model-based analysis of network traffic forecasting | |
CN118379876A (en) | Vehicle track generation method and device based on diffusion model and light federal learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |