CN114219027A - Lightweight time series prediction method based on discrete wavelet transform - Google Patents

Lightweight time series prediction method based on discrete wavelet transform Download PDF

Info

Publication number
CN114219027A
CN114219027A CN202111536500.3A CN202111536500A CN114219027A CN 114219027 A CN114219027 A CN 114219027A CN 202111536500 A CN202111536500 A CN 202111536500A CN 114219027 A CN114219027 A CN 114219027A
Authority
CN
China
Prior art keywords
sequence
discrete
prediction
data
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111536500.3A
Other languages
Chinese (zh)
Inventor
樊谨
王则昊
吉玉祥
汪森
孙丹枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111536500.3A priority Critical patent/CN114219027A/en
Publication of CN114219027A publication Critical patent/CN114219027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a lightweight time series prediction method based on discrete wavelet transform, which adopts a waveform decomposition module to decompose an input sequence to obtain a low-frequency component and a high-frequency component, so that the lengths of the two components are half of the input sequence, and then adopts a discrete feature extraction method based on a discrete network for extracting features in a layered parallel manner to predict the two components respectively; aiming at the defect of high computational complexity of the attention mechanism, the discrete network adopts the discrete attention mechanism to calculate the attention value in a blocking way, thereby reducing the computational complexity of the model. And finally, generating a final prediction sequence by adopting a waveform reconstruction module. The method can improve the resource utilization rate, and the smaller model size makes the method more competitive on the equipment with limited resources.

Description

Lightweight time series prediction method based on discrete wavelet transform
Technical Field
The invention belongs to the field of time series prediction, and particularly relates to a lightweight time series prediction method based on discrete wavelet transform.
Background
In recent years, time prediction technology has been widely used in various fields such as equipment health prediction systems, weather prediction, stock prediction, and the like. Time series prediction is an important branch in the field of time series analysis, and generally, a time series prediction method continuously learns and analyzes a time series in history so as to extract a characteristic determining the change of the time series, and on the basis of the characteristic, the time series change trend in a period of time in the future is predicted.
With the continuous and deep research on the time series prediction problem and the continuous emergence of various excellent methods, the requirements of the time series prediction problem on new methods are continuously increased, and the requirements are represented by higher requirements on prediction precision and the like, the increase of the length of a prediction sequence, the transformation from a univariate time series to a multivariate time series, the requirement on the scale reduction of a model as far as possible so as to enable the model to be widely applied, and the like.
In recent years, more and more time series prediction methods have been focused on improving prediction accuracy and increasing prediction series length. With the increasing requirements of the time series prediction problem, many methods are increasingly weak in learning the long-distance dependence problem in the time series, and further breakthrough is difficult to achieve. Until the introduction of the Attention-based (AT) transform approach, a new powerful module brought a new field of view, thanks to its breakthrough improvement in the problem of dependency between two elements with long extraction distances. The Transformer method is used for the time series prediction problem in more and more methods, and good progress is made. However, the Transformer has high computational complexity and a large model size, so that the Transformer has high requirements on the memory and cannot be directly used for longer prediction requirements. Thus, more and more Transformer variant models for improving the computational complexity of the Transformer are proposed, which make it possible to achieve better results in longer-time sequence prediction. Among the numerous variant models, the discrete feature extraction method (Sepformer) is considerably enhanced.
The discrete feature extraction method (separator) adopts a discrete Network (separator Network) for extracting global features and local features in a layered and parallel manner, so that the precision of the whole model is improved. Aiming at the defect of high computational complexity of a Self-attention (Self-attention) mechanism, a discrete attention (SeparateAttention) mechanism is adopted to calculate attention values in a blocking mode, so that the computational complexity of a model is reduced to O (C). The method can improve the accuracy of multivariate time series prediction, reduce the computational complexity and increase the maximum prediction length by comparing the existing methods. But the method still has larger model scale and lower resource utilization rate.
Disclosure of Invention
The invention aims to solve the technical problem that the memory occupation scale of the model is reduced as much as possible on the premise of ensuring the prediction precision, so that the model achieves a balance (trade-off) on various technical problems. The invention provides a lightweight time series prediction method based on discrete wavelet transform, which retains the high precision, low computation complexity and long series prediction capability of a discrete feature extraction method to the greatest extent after testing, further reduces the model scale and improves the resource utilization rate.
The technical scheme adopted by the invention is as follows: the method comprises the steps of decomposing an input sequence by a waveform decomposition module to obtain a low-frequency component and a high-frequency component, enabling the lengths of the two components to be half of the length of the input sequence, and then respectively predicting the two components by a discrete feature extraction method (separator) based on a discrete Network (separator Network) for extracting features in a layered parallel mode. Aiming at the defect of high computation complexity of a Self-attention (Self-attention) mechanism, a discrete attention (SeparateAttention) mechanism is adopted to calculate attention values in a blocking mode, and therefore computation complexity of a model is reduced. And finally, generating a final prediction sequence by adopting a waveform reconstruction module. The method can improve the resource utilization rate, and the smaller model size makes the method more competitive on the equipment with limited resources.
A lightweight time series prediction method based on discrete wavelet transform comprises the following steps:
step 1: and preprocessing the data to obtain a training data set and a verification data set.
Step 2: with the aid of the training data set obtained in step 1, 32 sets of training data are randomly selected each time when the device conditions allow, the history sequence and the start sequence in each set of data are respectively input into two Waveform Decomposition (Waveform Decomposition) modules, and the input sequence is decomposed into a low frequency component (approximate coefficient) and a high frequency component (detail coefficient).
And step 3: and (3) respectively inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules (Sepormers) for feature extraction. Each discrete feature extraction module comprises two encoders (encoders) and a Decoder (decoders), and corresponding input components are input into a discrete Network (Separate Network) in the encoders to extract global features and local features, so that two groups of global and local features corresponding to the two components are obtained finally.
And 4, step 4: and (3) respectively carrying out dimension alignment on the two groups of features obtained in the step (3) in a hidden layer after an encoder, and splicing the features after dimension alignment to finally obtain two groups of global features and local features corresponding to high-frequency and low-frequency components.
And 5: and (4) respectively inputting the two groups of characteristics obtained in the step (4) into corresponding decoders (decoders) in respective discrete characteristic extraction modules, and reconstructing the global characteristics and the local characteristics of each layer through a discrete Network (separator Network) in the decoders to generate a generated prediction sequence corresponding to the high-frequency component and the low-frequency component.
And 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a Waveform Reconstruction (Waveform Reconstruction) module, and recombining the high and low frequency components to obtain a final generated prediction sequence.
And 7: and (4) according to the generated prediction sequence obtained in the step (6), calculating the error between the generated prediction sequence and the real sequence through a Mean Square Error (MSE) and Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters.
And 8: and (4) selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step (7) and the verification data set obtained in the step (1), and executing the steps (2) to (7), wherein the verification data in the step (2) is replaced by the selected 32 groups of test data. And finally, generating a prediction sequence based on the test data.
And step 9: and (4) calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step (8), calculating the Mean Square Error (MSE) of all the groups of data, and then calculating the average value to finally obtain the prediction sequence generated based on the verification data set.
Step 10: and (5) repeating the step (2) to the step (9), if the Mean Square Error (MSE) obtained by the step (9) is not reduced any more, which indicates that the model performance cannot be improved any more, the network parameters are updated, and the model finishes training.
Step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
Further, the specific method in step 1 is as follows:
and selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format. Firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: historical sequence, predicted sequence, and starting sequence. And grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data. After completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
Further, the starting sequence length is less than or equal to the history sequence length in length, and the starting sequence is identical to the rear part of the history sequence in value. The historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.
Further, the waveform decomposition module is based on the principle of Discrete Wavelet Transform (DWT), and the formula is as follows:
Figure BDA0003412729050000051
Figure BDA0003412729050000052
Figure BDA0003412729050000053
subject.to.x=0,1,2..,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a Scaling Function, v (x) is a Wavelet Function; wu(0, k) and Wv(j, k) an approximation coefficient (approximate coefficient) and a detail coefficient (detail coefficient), respectively, which represent the low frequency component and the high frequency component; m is the sequence length; j and k are used to control the scaling of the scaling function.
Further, the discrete network adopts a Waveform Extraction module (WE) and a discrete Attention mechanism module (SA) to extract global features (global features) and local features (local features) layer by layer. The waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence.
Further, the overall formula of the waveform extraction module is as follows:
Figure BDA0003412729050000054
Figure BDA0003412729050000055
Figure BDA0003412729050000056
Figure BDA0003412729050000057
wherein
Figure BDA0003412729050000058
And
Figure BDA0003412729050000059
respectively representing the global trend and the local fluctuation of the waveform, and extracting global features and local features through a discrete attention mechanism module as input;
Figure BDA0003412729050000061
is an input sequence of the first layer WE;
Figure BDA0003412729050000062
is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function that sets a sliding window, slides one cell at a time, then averages all elements within the window, and assigns the resulting value to the current cell. Will be blocked and then input into AvgPool,
Figure BDA0003412729050000063
representing the ith block.
Furthermore, the discrete Attention mechanism module firstly divides the input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (AT), then performs dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block in proportion, and finally splices and outputs the blocks. The calculation formula of the discrete Attention mechanism (AT) is as follows:
Figure BDA0003412729050000064
Figure BDA0003412729050000065
wherein,
Figure BDA0003412729050000066
an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;
Figure BDA0003412729050000067
q, K, V are respectively represented on ith block of the l layer;
Figure BDA0003412729050000068
and
Figure BDA0003412729050000069
representing the ith partitions of the ith layers Q, K, V and B, respectively. Q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the blocks are subjected to linear transformation. Wherein the attention mechanism is defined as:
Figure BDA00034127290500000610
wherein d ismodelRepresenting a feature dimension.
Further, the discrete network integral function expression is as follows:
Figure BDA0003412729050000071
wherein ZlRepresenting global features of the l-th layer of the discrete network, HlLocal features representing the l-th layer of the discrete network; xSNRepresenting the input of SN.
The invention has the beneficial effects that:
according to the method, a Waveform Decomposition module (Waveform Decomposition) and a Waveform Reconstruction module (Waveform Reconstruction) based on discrete wavelet change are used for decomposing and reconstructing a time sequence, the Waveform Decomposition module decomposes an input sequence into a low-frequency component and a high-frequency component, the lengths of the two components are half of the length of the input sequence, then a discrete feature extraction module (Sepormer) is used for carrying out feature extraction, and the Waveform Reconstruction module is used for reconstructing the predicted components to generate a final predicted sequence. The invention greatly reduces the scale of the model and improves the resource utilization rate.
In the multivariate time series prediction, the problems of prediction precision, prediction sequence length, fitting ability to local fine fluctuation and the like are all important factors influencing the prediction effect. The invention adopts the waveform decomposition and waveform reconstruction module based on discrete wavelet transform to decompose the input sequence, thereby reducing the scale of the model and improving the resource utilization rate. By adopting a mechanism of extracting global features and local features of the multivariate time sequence in a layered parallel manner, the prediction precision is improved, the fitting capability of the local fine fluctuation of the multivariate time sequence is improved by utilizing the local features, the prediction length of the model is increased, and the effect of the model on the prediction of the multivariate time sequence is greatly improved.
Drawings
Fig. 1 is a schematic view of the overall structure of the embodiment of the present invention.
Fig. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
FIG. 3 is a diagram of a discrete feature extraction module (Sepormer) according to an embodiment of the present invention
Fig. 4 is a structural diagram of a discrete Network (Separate Network) according to an embodiment of the present invention.
Fig. 5 is a block diagram of a discrete Attention mechanism (separator attachment) of an embodiment of the present invention.
FIG. 6 is a model diagram of the discrete waveform decomposition method (SWformer) and the Mini-discrete waveform decomposition method (Mini-SWformer) that discards high frequency components to further reduce the model size.
Fig. 7 is a comparison of the discrete waveform decomposition method and the miniature discrete waveform decomposition method with six existing methods in terms of Mean Square Error (MSE) under five public data sets.
FIG. 8 is a comparison of the GPU usage of SWformer of the present invention and Mini-SWformer and Informmer with smaller model sizes under the same conditions.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific implementation steps:
a lightweight time series prediction method based on discrete wavelet transform comprises the following steps:
step 1: and (4) preprocessing data. And selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format. Firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: historical sequence, predicted sequence, and starting sequence. In length, the starting sequence length is less than or equal to the history sequence length, and in value, the starting sequence is the same as the latter part of the history sequence. The historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence. And grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data. After completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
As shown in fig. 1, the overall structure of the present invention is shown. The data processing and dividing part is arranged at the entrance of the structure of the invention and is responsible for carrying out primary processing on the original data to form a data structure required by a prediction model. Fig. 2 is a detailed structural schematic diagram of an embodiment of the present invention.
Step 2: with the aid of the training data set obtained in step 1, 32 sets of training data are randomly selected each time when the device conditions allow, the history sequence and the start sequence in each set of data are respectively input into two Waveform Decomposition (Waveform Decomposition) modules, and the input sequence is decomposed into a low frequency component (approximate coefficient) and a high frequency component (detail coefficient). The waveform decomposition module is based on Discrete Wavelet Transform (DWT) principle, and the formula is as follows:
Figure BDA0003412729050000091
Figure BDA0003412729050000092
Figure BDA0003412729050000093
subject.to.x=0,1,2...,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a Scaling Function, v (x) is a Wavelet Function; wu(0, k) and Wv(j, k) an approximation coefficient (approximate coefficient) and a detail coefficient (detail coefficient), respectively, which represent the low frequency component and the high frequency component; m is the sequence length; j and k are used to control the scaling of the scaling function.
And step 3: and (3) respectively inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules (Sepormers) for feature extraction. Each discrete feature extraction module comprises two encoders (encoders) and a Decoder (decoders), and corresponding input components are input into a discrete Network (Separate Network) in the encoders to extract global features and local features, so that two groups of global and local features corresponding to the two components are obtained finally.
As shown in fig. 3, the overall structure of the discrete feature extraction module (Sepformer) of the present invention is shown, and the discrete feature extraction module (Sepformer) includes two encoders (encoders) and one Decoder (decoders). The core modules of the encoder and the decoder are discrete networks (SN).
As shown in fig. 4, an overall structure of a discrete Network (discrete Network) is shown, and the discrete Network adopts a Waveform Extraction module (WE) and a discrete Attention mechanism module (SA) to extract global features (global features) and local features (local features) layer by layer. The waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence. The overall formula of the waveform extraction module is as follows:
Figure BDA0003412729050000101
Figure BDA0003412729050000102
Figure BDA0003412729050000103
Figure BDA0003412729050000104
wherein
Figure BDA0003412729050000105
And
Figure BDA0003412729050000106
respectively representing global trends and local fluctuations of the waveform for extracting, as input, the global trend and the local fluctuation of the waveform by means of a discrete attention mechanism moduleFeatures and local features;
Figure BDA0003412729050000107
is an input sequence of the first layer WE;
Figure BDA0003412729050000108
is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function that sets a sliding window, slides one cell at a time, then averages all elements within the window, and assigns the resulting value to the current cell. Will be blocked and then input into AvgPool,
Figure BDA0003412729050000109
representing the ith block.
As shown in fig. 5, a discrete Attention mechanism module (SA) is shown, which is used for feature extraction. The discrete Attention mechanism module firstly divides an input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (AT), then carries out dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block according to a proportion, and finally splices and outputs the blocks. The calculation formula of the discrete Attention mechanism (AT) is as follows:
Figure BDA0003412729050000111
Figure BDA0003412729050000112
wherein,
Figure BDA0003412729050000113
an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;
Figure BDA0003412729050000114
individual watchQ, K, V learnable weight matrix on ith partition of layer l;
Figure BDA0003412729050000115
Vi land
Figure BDA0003412729050000116
representing the ith partitions of the ith layers Q, K, V and B, respectively. Q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the blocks are subjected to linear transformation. Wherein the attention mechanism is defined as:
Figure BDA0003412729050000117
wherein d ismodelRepresenting a feature dimension.
The discrete network overall function expression is as follows:
Figure BDA0003412729050000118
wherein ZlRepresenting global characteristics of the ith layer of the discrete network, and Hl representing local characteristics of the ith layer of the discrete network; xSNRepresenting the input of SN.
And 4, step 4: and (3) respectively carrying out dimension alignment in the hidden layer after the encoder by means of the two groups of features obtained in the step (3), and then splicing the features after the dimension alignment to finally obtain two groups of global features and local features corresponding to the high-frequency and low-frequency components.
As shown in fig. 3, the global features and the local features output by the True Encoder (True Encoder) and the prediction Encoder (Pred Encoder) are respectively spliced, wherein the two features output by the True Encoder (True Encoder) are subjected to latitude transformation to have the same dimension as that of the prediction Encoder (Pred Encoder) through a Feed-Forward Network (FFN), and then the two features are respectively spliced to obtain the global features and the local features.
And 5: and (4) respectively inputting the two groups of characteristics obtained in the step (4) into corresponding decoders (decoders) in respective discrete characteristic extraction modules, and reconstructing the global characteristics and the local characteristics of each layer through a discrete Network (separator Network) in the decoders to generate a generated prediction sequence corresponding to the high-frequency component and the low-frequency component.
And 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a Waveform Reconstruction (Waveform Reconstruction) module, and recombining the high and low frequency components to obtain a final generated prediction sequence.
And 7: and (4) according to the generated prediction sequence obtained in the step (6), calculating the error between the generated prediction sequence and the real sequence through a Mean Square Error (MSE) and Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters. The Mean Square Error (MSE) and Mean Absolute Error (MAE) equations are as follows:
Figure BDA0003412729050000121
Figure BDA0003412729050000122
wherein y is a predicted value;
Figure BDA0003412729050000123
is the true value; n represents the length of the sequence.
And 8: and (4) selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step (7) and the verification data set obtained in the step (1), and executing the steps (2) to (7), wherein the verification data in the step (2) is replaced by the selected 32 groups of test data. And finally, generating a prediction sequence based on the test data.
And step 9: and (4) calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step (8), calculating the Mean Square Error (MSE) of all the groups of data, and then calculating the average value to finally obtain the prediction sequence generated based on the verification data set.
Step 10: and (5) repeating the step (2) to the step (9), if the Mean Square Error (MSE) obtained by the step (9) is not reduced any more, which indicates that the model performance cannot be improved any more, the network parameters are updated, and the model finishes training.
Step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
Fig. 6 shows two methods of the present invention: discrete waveform decomposition method (SWformer) and Mini-discrete waveform decomposition method (Mini-SWformer). The amount of information contained in the time-series data by the high-frequency components is small, and appropriate reduction of the high-frequency components can reduce the amount of calculation of the model to some extent, thereby reducing the scale of the model. Based on the theoretical basis, the miniature discrete waveform decomposition method deletes the high-frequency components and the whole branch decomposed in the discrete waveform decomposition method, thereby further reducing the scale of the model.
FIG. 7 shows the results of the two methods of the present invention and seven methods such as Informmer, LogTrans, Reformer, LSTMa, and LSTnet under the same experimental conditions on five data sets such as ETTh1, ETTh2, ETTm1, Weather, and ECL, with the measures of Mean Square Error (MSE) and square absolute value (MAE). The results of the experiments for the best performing model under each experimental condition are shown in bold in the table. From the table of fig. 6, it can be seen that the discrete waveform decomposition method (SWformer) and the Mini discrete waveform decomposition method (Mini-SWformer) are greatly improved compared with the other five methods. Compared with the Informmer method, the MSE of the discrete feature extraction method is reduced by 22.53 percent on average, the MSE of the discrete waveform decomposition method is reduced by 19.29 percent on average, and the MSE of the micro discrete waveform decomposition method is reduced by 16.54 percent on average.
FIG. 8 shows the comparison and variation of the discrete waveform decomposition method (SWformer), Mini-discrete waveform decomposition method (Mini-SWformer) and Informmer in memory usage with increasing predicted sequence length under the same experimental conditions. It can be seen that the advantage of the discrete waveform decomposition method and the micro discrete waveform decomposition method in terms of memory usage is greater and greater as the length of the prediction sequence is longer and longer. Compared with the Informer, the discrete waveform decomposition method has the average reduction of 52.62 percent on the memory usage amount, and the micro discrete waveform decomposition method has the average reduction of 68.02 percent.

Claims (8)

1. A lightweight time series prediction method based on discrete wavelet transform is characterized by comprising the following steps:
step 1: preprocessing data to obtain a training data set and a verification data set;
step 2: with the help of the training data set obtained in the step 1, randomly selecting 32 groups of training data each time under the condition of permission of equipment conditions, respectively inputting the historical sequence and the initial sequence in each group of data into two waveform decomposition modules, and decomposing the input sequence into a low-frequency component and a high-frequency component;
and step 3: inputting the low-frequency component and the high-frequency component obtained in the step (2) into two discrete feature extraction modules respectively for feature extraction; each discrete feature extraction module comprises two encoders and a decoder, and inputs the corresponding input components into a discrete network in the encoders to extract global features and local features, and finally two groups of global and local features corresponding to the two components are obtained;
and 4, step 4: respectively carrying out dimension alignment on the two groups of characteristics obtained in the step 3 in a hidden layer after an encoder, and splicing the characteristics after dimension alignment to finally obtain two groups of global characteristics and local characteristics corresponding to high-frequency and low-frequency components;
and 5: inputting the two groups of characteristics obtained in the step 4 into corresponding decoders in respective discrete characteristic extraction modules, reconstructing global characteristics and local characteristics of each layer through a discrete network in the decoders, and generating a generation prediction sequence corresponding to high-frequency components and low-frequency components;
step 6, performing an inverse process of wavelet decomposition on the two groups of prediction sequences corresponding to the high and low frequency components obtained in the step 5 through a waveform reconstruction module, and recombining the high and low frequency components to obtain a final generated prediction sequence;
and 7: calculating the error between the generated prediction sequence and the real sequence according to the generated prediction sequence obtained in the step 6 through a Mean Square Error (MSE) and a Mean Absolute Error (MAE) formula, and performing back propagation through an Adam optimizer to update network parameters;
and 8: selecting 32 groups of verification data as input by means of the model after updating the network parameters in the step 7 and the verification data set obtained in the step 1, and executing the steps 2 to 7, wherein the verification data in the step 2 is replaced by the selected 32 groups of test data; finally, a generated prediction sequence based on the test data is obtained;
and step 9: calculating the Mean Square Error (MSE) between the generated prediction sequence based on the verification data and the prediction sequence obtained in the step 8, calculating the Mean Square Error (MSE) of all the groups of data, and finally obtaining the prediction sequence generated based on the verification data set;
step 10: repeating the step 2 to the step 9, if the mean square error MSE obtained by the step 9 is not reduced any more, which indicates that the model performance can not be improved any more, the network parameters are updated, and the model finishes training;
step 11: and (5) inputting the input sequence given by the prediction task into the trained model finally obtained in the step (10), performing sequence prediction, outputting the finally obtained prediction sequence, and completing prediction.
2. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein the specific method in step 1 is as follows:
selecting a proper public time sequence data set, and grouping and segmenting to adapt to the requirement of the model on the data format; firstly, setting the historical sequence length, the predicted sequence length and the starting sequence length in each group of data according to requirements, wherein the three lengths respectively correspond to three parts in each group of data: a historical sequence, a predicted sequence, and a starting sequence; grouping by adopting a sliding window mechanism, wherein the window length is the sum of the historical sequence length and the predicted sequence length, and the window moves by one bit each time, namely, only one bit of difference exists between two adjacent groups of data; after completion of the data packet, 70% of the group data was intercepted as the training data set and 30% of the group data was intercepted as the validation data set.
3. The discrete wavelet transform-based lightweight time series prediction method of claim 2, wherein in length, the starting sequence length is less than or equal to the historical sequence length, and in value, the starting sequence is the same as the latter part of the historical sequence; the historical sequence and the predicted sequence are connected in position in tandem, and the length of each group of data is the sum of the length of the historical sequence and the length of the predicted sequence.
4. The discrete wavelet transform-based lightweight time series prediction method according to claim 1, wherein said waveform decomposition module is based on the principle of discrete wavelet transform, and the formula is as follows:
Figure FDA0003412729040000031
Figure FDA0003412729040000032
Figure FDA0003412729040000033
subject.to.x=0,1,2...,M-1
j=0,1,2,...,J-1
k=0,1,2,...,2j-1
u (x) is a scale function, and v (x) is a wavelet function; wu(0, k) and Wv(j, k) approximation coefficients and detail coefficients, respectively, representing low frequency components and high frequency components; m is the sequence length; j and k are used to control the scaling of the scaling function.
5. The discrete wavelet transform-based lightweight time series prediction method as claimed in claim 1, wherein the discrete network adopts a waveform extraction module and a discrete attention mechanism module to extract global features and local features layer by layer; the waveform extraction module decomposes the input sequence, the whole input sequence is traversed through a sliding window mechanism to obtain an average value in a window, the global trend of the input sequence is obtained, and the obtained global trend is subtracted by the input sequence to obtain the local fluctuation of the input sequence.
6. The discrete wavelet transform-based lightweight time series prediction method according to claim 5, wherein the overall formula of the waveform extraction module is as follows:
Figure FDA0003412729040000034
Figure FDA0003412729040000035
Figure FDA0003412729040000036
Figure FDA0003412729040000037
wherein
Figure FDA0003412729040000038
And
Figure FDA0003412729040000039
respectively representing the global trend and the local fluctuation of the waveform, and extracting global features and local features through a discrete attention mechanism module as input;
Figure FDA0003412729040000041
is an input sequence of the first layer WE;
Figure FDA0003412729040000042
is a connection symbol for connecting different partitions; the AvgPool function is a mean pooling function, a sliding window is set, a unit slides each time, then the mean value of all elements in the window is calculated, and the obtained value is assigned to the current unit; will be blocked and then input into AvgPool,
Figure FDA0003412729040000043
representing the ith block.
7. The discrete wavelet transform-based lightweight time series prediction method is characterized in that a discrete Attention mechanism module firstly divides an input sequence into blocks (Block, B) with the same length, then extracts features through a shared Attention mechanism module (Attention, AT), then performs dimension transformation through a Feed-Forward Network (FFN), shortens the length of each Block in proportion, and finally splices and outputs the blocks; the calculation formula of the discrete Attention mechanism (AT) is as follows:
Figure FDA0003412729040000044
Figure FDA0003412729040000045
wherein,
Figure FDA0003412729040000046
an input sequence of a discrete attention mechanism module (SA) of the l level; b represents a Block (Block) obtained by the input sequence;
Figure FDA0003412729040000047
q, K, V are respectively represented on ith block of the l layer;
Figure FDA0003412729040000048
and
Figure FDA0003412729040000049
ith partitions representing ith layers Q, K, V and B, respectively; q, K and V respectively represent a problem matrix (query), a key value matrix (key) and a value matrix (value) obtained after the block is subjected to linear transformation; wherein the attention mechanism is defined as:
Figure FDA00034127290400000410
wherein d ismodelRepresenting a feature dimension.
8. The discrete wavelet transform-based lightweight time series prediction method according to claim 7, wherein the discrete network integral function expression is as follows:
Figure FDA0003412729040000051
wherein ZlRepresenting global features of the l-th layer of the discrete network, HlLocal features representing the l-th layer of the discrete network; xSNRepresenting the input of SN.
CN202111536500.3A 2021-12-15 2021-12-15 Lightweight time series prediction method based on discrete wavelet transform Pending CN114219027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536500.3A CN114219027A (en) 2021-12-15 2021-12-15 Lightweight time series prediction method based on discrete wavelet transform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536500.3A CN114219027A (en) 2021-12-15 2021-12-15 Lightweight time series prediction method based on discrete wavelet transform

Publications (1)

Publication Number Publication Date
CN114219027A true CN114219027A (en) 2022-03-22

Family

ID=80702457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536500.3A Pending CN114219027A (en) 2021-12-15 2021-12-15 Lightweight time series prediction method based on discrete wavelet transform

Country Status (1)

Country Link
CN (1) CN114219027A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114345A (en) * 2022-04-02 2022-09-27 腾讯科技(深圳)有限公司 Feature representation extraction method, device, equipment, storage medium and program product
CN115293244A (en) * 2022-07-15 2022-11-04 北京航空航天大学 Smart grid false data injection attack detection method based on signal processing and data reduction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19980074795A (en) * 1997-03-27 1998-11-05 윤종용 Image Coding Method Using Global-Based Color Correlation
CN110826803A (en) * 2019-11-06 2020-02-21 广东电力交易中心有限责任公司 Electricity price prediction method and device for electric power spot market
CN112862875A (en) * 2021-01-18 2021-05-28 中国科学院自动化研究所 Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19980074795A (en) * 1997-03-27 1998-11-05 윤종용 Image Coding Method Using Global-Based Color Correlation
CN110826803A (en) * 2019-11-06 2020-02-21 广东电力交易中心有限责任公司 Electricity price prediction method and device for electric power spot market
CN112862875A (en) * 2021-01-18 2021-05-28 中国科学院自动化研究所 Rain removing method, system and equipment for rain chart based on selective mechanism attention mechanism

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114345A (en) * 2022-04-02 2022-09-27 腾讯科技(深圳)有限公司 Feature representation extraction method, device, equipment, storage medium and program product
CN115114345B (en) * 2022-04-02 2024-04-09 腾讯科技(深圳)有限公司 Feature representation extraction method, device, equipment, storage medium and program product
CN115293244A (en) * 2022-07-15 2022-11-04 北京航空航天大学 Smart grid false data injection attack detection method based on signal processing and data reduction
CN115293244B (en) * 2022-07-15 2023-08-15 北京航空航天大学 Smart grid false data injection attack detection method based on signal processing and data reduction

Similar Documents

Publication Publication Date Title
CN109214575B (en) Ultrashort-term wind power prediction method based on small-wavelength short-term memory network
CN112863180B (en) Traffic speed prediction method, device, electronic equipment and computer readable medium
CN114239718B (en) High-precision long-term time sequence prediction method based on multi-element time sequence data analysis
CN112364975A (en) Terminal operation state prediction method and system based on graph neural network
CN114219027A (en) Lightweight time series prediction method based on discrete wavelet transform
CN107292446B (en) Hybrid wind speed prediction method based on component relevance wavelet decomposition
CN113747163B (en) Image coding and decoding method and compression method based on context recombination modeling
CN112767959B (en) Voice enhancement method, device, equipment and medium
CN116362398A (en) Power load prediction method based on modal decomposition and reconstruction and LSTM-MLR hybrid model
CN112434891A (en) Method for predicting solar irradiance time sequence based on WCNN-ALSTM
CN109583588B (en) Short-term wind speed prediction method and system
CN117709556B (en) Photovoltaic power generation short-term prediction method, system, medium and equipment
CN111141879B (en) Deep learning air quality monitoring method, device and equipment
Kozat et al. Universal switching linear least squares prediction
CN108491958B (en) Short-time bus passenger flow chord invariant prediction method
CN117852686A (en) Power load prediction method based on multi-element self-encoder
CN102299766B (en) Joint optimization method for dimensionality reduction and quantification of communication signal for object state estimation
CN115713155A (en) Traffic sequence prediction method based on multivariate time sequence data analysis
CN113949880A (en) Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method
Sun et al. KSVD-based multiple description image coding
CN110705044A (en) Simulation verification method for simulating wind power generation system based on deep learning
CN111476408A (en) Power communication equipment state prediction method and system
CN112446516A (en) Travel prediction method and device
Bai et al. A NN-GM (1, 1) model-based analysis of network traffic forecasting
CN118379876A (en) Vehicle track generation method and device based on diffusion model and light federal learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination