CN111859264B - Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition - Google Patents

Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition Download PDF

Info

Publication number
CN111859264B
CN111859264B CN202010659067.1A CN202010659067A CN111859264B CN 111859264 B CN111859264 B CN 111859264B CN 202010659067 A CN202010659067 A CN 202010659067A CN 111859264 B CN111859264 B CN 111859264B
Authority
CN
China
Prior art keywords
super
wavelet
decomposition
gru
wavelet decomposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010659067.1A
Other languages
Chinese (zh)
Other versions
CN111859264A (en
Inventor
金学波
张家辉
苏婷立
白玉廷
孔建磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202010659067.1A priority Critical patent/CN111859264B/en
Publication of CN111859264A publication Critical patent/CN111859264A/en
Application granted granted Critical
Publication of CN111859264B publication Critical patent/CN111859264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/148Wavelet transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Abstract

The invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps: optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor; acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result; building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result; and obtaining a prediction result according to the training result. The invention uses Bayesian optimization algorithm to optimize the super parameters, and has high accuracy in long-term time sequence prediction task.

Description

Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition
Technical Field
The present disclosure relates to the field of time sequence prediction, and in particular, to a time sequence prediction method and apparatus based on bayesian optimization and wavelet decomposition.
Background
Along with the continuous progress of industrialization and urbanization, information storage, sensor networks and computer technologies are rapidly developed, and technologies such as the internet play an important role in the life of people. The information is a time sequence which is generated continuously in turn according to the same time interval, such as the temperature of the weather monitoring station, the PM2.5 concentration of the atmosphere and the like, is simply recorded on historical events, and simultaneously, a lot of useful information is stored in the information, such as the temperature data of the weather monitoring station contains the temperature change rule of the weather monitoring station all the year round. Therefore, the time series are studied to extract the hidden information in the data, so that the change rule can be grasped and the future data can be predicted in advance.
Modeling historical time series data to predict data of a future period is a category of time series prediction, and research in the field of time series prediction has a certain foundation, and methods thereof can be roughly divided into two types. One is a traditional probability method, the traditional time prediction method is greatly limited by given data knowledge, and modeling conditions are severe, so that the effect of the method is not good; the other is a machine learning method, the method can design an algorithm for parameter learning according to task demands only by knowing historical data, modeling of a model is relatively easy, and the machine learning method is better in nonlinear prediction tasks.
The time sequence prediction method based on machine learning starts from a shallow neural network, but the shallow neural network cannot accurately model complex data due to the limitation of network depth, so that the shallow neural network can only be applied to short-term prediction and cannot perform accurate long-term prediction tasks. In order to improve the disadvantages of the shallow network, the structure of the network is gradually deepened, and the deep neural networks such as the cyclic neural network (Recurrent Neural Network, RNN) and the GRU become the main research direction of time sequence prediction. However, most of time series data are obtained from a real environment through research, so the data have strong volatility, randomness and complexity, and the prediction accuracy is difficult to ensure only by analyzing and learning the data through a deep neural network.
Disclosure of Invention
In order to solve one of the technical problems, the invention provides a time sequence prediction method and device based on Bayesian optimization and wavelet decomposition.
The first aspect of the embodiment of the invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps:
optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;
building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;
and obtaining a prediction result according to the training result.
Preferably, the process of optimizing the model super-parameters according to the bayesian optimization method to obtain the optimal super-parameters includes:
defining a model hyper-parameter optimized objective function, wherein the model hyper-parameter optimized objective function obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model super-parameter optimized objective function;
carrying out Gaussian process treatment on the model hyper-parameter optimized objective function to obtain posterior probability of the model hyper-parameter optimized objective function;
and carrying out parameter updating on the Bayesian optimized objective function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain an optimal super parameter.
Preferably, the acquiring the acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers and a mother wavelet function in wavelet decomposition obtained after optimization to obtain a decomposition result includes:
decomposing the acquired data into a low-frequency component and a high-frequency component according to the optimized parent wavelet function and the parent wavelet function corresponding to the parent wavelet function, wherein the decomposition layer number is determined according to the optimized wavelet decomposition layer number;
processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;
the high frequency component is processed by a high frequency filter to obtain a high frequency subsequence.
Preferably, the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain the training result includes:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition by the GRU subsequence predictor to obtain training results of the subsequences.
Preferably, the process of obtaining the prediction result according to the training result includes:
and carrying out summation processing on the training results of the subsequences to obtain a prediction result.
A second aspect of an embodiment of the present invention provides a timing prediction apparatus based on bayesian optimization and wavelet decomposition, the apparatus including a processor configured with operation instructions executable by the processor to perform operations of:
optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;
building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;
and obtaining a prediction result according to the training result.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
defining a model hyper-parameter optimized objective function, wherein the model hyper-parameter optimized objective function obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model super-parameter optimized objective function;
carrying out Gaussian process treatment on the model hyper-parameter optimized objective function to obtain posterior probability of the model hyper-parameter optimized objective function;
and carrying out parameter updating on the Bayesian optimized objective function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain an optimal super parameter.
Preferably, the apparatus further comprises a low frequency filter and a high frequency filter, the processor being configured with processor-executable operating instructions to perform operations of:
decomposing the acquired data into a low-frequency component and a high-frequency component according to the optimized parent wavelet function and the parent wavelet function corresponding to the parent wavelet function, wherein the decomposition layer number is determined according to the optimized wavelet decomposition layer number;
the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;
the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition by the GRU subsequence predictor to obtain training results of the subsequences.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
and carrying out summation processing on the training results of the subsequences to obtain a prediction result.
The beneficial effects of the invention are as follows: in view of the characteristics of strong nonlinearity and strong randomness of the time sequence, the invention provides a mixed deep learning model combining a time sequence data decomposition method with a deep neural network. The complexity of the complex sequence is reduced through wavelet decomposition, then GRU network is used for predicting the result obtained through decomposition, and finally the prediction result is obtained through fusion. The invention can effectively improve the accuracy of prediction, uses the Bayesian optimization algorithm to optimize the super parameters, and has high accuracy in long-term time sequence prediction tasks.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a method of timing prediction based on Bayesian optimization and wavelet decomposition;
FIG. 2 is a schematic diagram of a wavelet decomposition process;
FIG. 3 is a schematic diagram of the result of wavelet decomposition, wherein the left region is a low frequency subsequence and the right region is a high frequency subsequence;
FIG. 4 is a schematic diagram of a GRU sub-predictor;
FIG. 5 is a schematic diagram of the overall structural framework of a WD-GRU hybrid model;
FIG. 6 is a schematic diagram of model predictive curve based on Bayesian optimization and stochastic search methods;
FIG. 7 is a schematic diagram showing the prediction results of PM2.5 per hour for Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM for Beijing every hour from 22/2016/3 to 9/2016;
FIG. 8 is a schematic diagram showing a comparison of RMSE and MAE details of the Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM;
FIG. 9 is a schematic diagram showing a comparison of details of NRMSE, SMAPE and R of Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is given with reference to the accompanying drawings, and it is apparent that the described embodiments are only some of the embodiments of the present application and not exhaustive of all the embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
Example 1
As shown in fig. 1, the present embodiment proposes a time sequence prediction method based on bayesian optimization and wavelet decomposition, where the method includes:
and S101, optimizing the model super-parameters according to a Bayesian optimization method to obtain the optimal super-parameters.
Specifically, the super-parametric choice of the deep learning model directly determines the performance of the model. In this embodiment, one of the bayesian optimization methods is implemented by the python-based hyperopt library: optimization based on Sequence Model (SMBO).
In determining model parameters using bayesian optimization, an objective function and optimized hyper-parameter space need to be defined. Since the training process of deep learning is actually a black box, the Root Mean Square Error (RMSE) of the hybrid model is used as an objective function for model super-parametric optimization:
where m is the number of input samples, y i (w) is a predicted value of the value,is an actual value.
The bayesian optimized objective function can be expressed as:
wherein w is * And (3) determining optimal parameters for Bayesian optimization, wherein W is an input set of super parameters, and W is a parameter space of the multidimensional super parameters.
The Bayesian optimization is divided into a Gaussian Process (GP) and a super-parameter selection process, and in the Gaussian process, when a set objective function g (w) obeys the following Gaussian distribution:
g(w)~GP(μ(w),O(w,w'))
where μ (w) is the mean of g (w), O (w, w ') is the covariance matrix of g (w), and initial O (w, w') can be expressed as:
in Bayesian optimization, the covariance matrix of the Gaussian process changes with the iterative process, assuming that the set of parameters input at step t+1 is w t+1 The covariance matrix at this time can be expressed as:
wherein o= [ o (w t+1 ,w 1 ),o(w t+1 ,w 2 ),...,o(w t+1 ,w t )]The posterior probability of the objective function can be obtained at this time:
wherein θ is the observed data, μ t+1 (w) is the mean value of the (t+1) th step g (w),is the variance of step g (w) at t+1.
After posterior probability is obtained, the optimal super-parameters are found through a super-parameter searching method based on the mean value and the variance of the posterior probability, and the super-parameter searching is completed through a UCB acquisition function in the embodiment:
wherein ζ t+1 Is a constant, S (w|theta t ) For UCB acquisition function, w t+1 Is the super parameter of the selected t+1st step. The super parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super parameters of the GRU sub predictor.
S102, acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result.
Specifically, in this embodiment, after acquiring the acquired data, wavelet decomposition is performed on the acquired data to reduce the complexity of the data.
When the wavelet decomposition is performed on the acquired data, a mother wavelet function is first selected, and the mother wavelet function can be directly obtained after optimization in S101:
each parent wavelet function has a corresponding parent wavelet function:
wherein k is a scaling factor, k ε R; k is not equal to 0, h is a translation coefficient, h e R, and t is a time index.
A complex sequence can be decomposed into a low frequency sub-sequence and a high frequency sub-sequence by a wavelet basis consisting of a parent wavelet function and a parent wavelet function:
wherein M (t) is the decomposed sequence, a k,h Representing a low frequency component with a scaling factor k and a translation factor h, d k,h And (3) representing a high-frequency component with a scaling coefficient of k and a translation coefficient of h, m representing the original sequence length, and n representing the wavelet decomposition layer number, wherein the wavelet decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization in S101. Then a is processed using a Low Pass Filter (LPF) and a High Pass Filter (HPF) k,h And d k,h To obtain a low frequency subsequence A k,h And high frequency subsequence D k,h . The wavelet decomposition process is illustrated in fig. 2, and the result of 8-layer wavelet decomposition of a PM2.5 sequence selection "db35" parent wavelet function is illustrated in fig. 3.
And S103, building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result.
Specifically, the GRU is a variant of the long and short term memory network (Long Short Term Memory, LSTM), which is simpler and more effective than the LSTM network, and has only an update gate and a reset gate in the GRU network structure. In this embodiment, each portion obtained by wavelet decomposition may be learned and predicted by using a two-layer GRU sub-predictor, and fig. 4 is a structure of the GRU sub-predictor in this embodiment. The super parameters of the GRU sub-predictor specifically further comprise: the number of neurons in the first layer of the GRU, the Dropout rate, the number of training times, the batch size, and the optimizer.
GRU algorithm pseudocode:
(1) Normalizing data set θ
(2) Model learning training data
Learn H based on θ
return H
The contents of the model training section can be realized based on the above-described processes of S101 to S103, namely: firstly, performing a super-parameter optimizing process, determining the optimal super-parameters of a training model according to a Bayesian optimizing method, and then performing model training by using the set of super-parameters; and secondly, when model training is carried out, the original sequence is decomposed based on wavelet decomposition to obtain corresponding low-frequency components and high-frequency components, and then each component is subjected to regular learning by using a GRU sub-predictor. Thus, the pseudo code of the Bayes optimization algorithm can be preliminarily obtained:
input: θ is the dataset, g (W) is the RMSE of the model, W is the hyper-parameter space (W ε W), H (w|θ) i ) Is UCB acquisition function, T is the number of super parameters to be selected, and l is the number of sub-sequences of wavelet decomposition.
And (3) outputting: optimum super parameter w *
(1) Initializing, θ (l) ←InitSamples(g(w),θ,l)
(2)
(3) Modeling an objective function g (w), and calculating a posterior probability
(4) Using UCB acquisition functionThe row parameters are updated and,
(5) Using w * The super-parameters train the model provided by the invention to obtain the prediction y i ←g(w * ) Calculate and update
(6)
(7)endfor
(8)
(9)return w *
S104, obtaining a prediction result according to the training result.
Specifically, in S103, after learning and predicting each portion obtained by wavelet decomposition by using a GRU sub-predictor, training results of each sub-sequence are obtained, and then the training results of each sub-sequence are summed to obtain a predicted result.
The method proposed in this embodiment is further described below by way of two specific examples.
Example 1
In the method proposed in this embodiment, a bayesian optimization method is used to determine the optimal super-parameters, and this example will demonstrate the use of the bayesian optimization method and verify the results.
First, the PM2.5 data set used in this experiment recorded the average PM2.5 concentration per hour for Beijing city between 2013 and 2017, totaling 37704 bars, in μg/m 3 . The model prediction period is set to 24 steps, that is, the model realizes the function of predicting the value of 24 hours in the future by using the historical data of 24 hours in the previous day, and fig. 5 shows the overall structure of the model of the embodiment. Testing Beijing air every year from day 3, month 22, 2016, month 4, month 9 of 2016Hours of PM2.5 content data.
Next, a super-parametric optimization process is performed, according to the above step S101, a super-parametric space and an objective function are required to be defined, table 1 shows the super-parametric space used in this example, where the space defines the number of wavelet decomposition layers, the mother wavelet function, the number of neurons in the first layer, the Dropout rate, the batch size, the training times, the range of the optimizer, and the optimization process selects RMSE of the hybrid model as the objective function. In this example, after 100 selections, bayesian optimization provides a set of optimal hyper-parameters. Table 2 gives the result of the bayesian optimization of the determined hyper-parameters, and for comparison, a set of hyper-parameters was also obtained using the conventional random search method.
The model training process is then performed, and the final results are shown in table 3, where Root Mean Square Error (RMSE), normalized mean square error (NRMSE), mean Absolute Error (MAE), symmetric Mean Absolute Percentage Error (SMAPE), and pearson correlation coefficient (R) are used as evaluation criteria for model performance. Compared with the traditional random optimization method, the model trained by the Bayesian optimization method has better performance, and the RMSE reaches 21.7300 mug/m 3 The R index is 0.9276, which is a good performance. Fig. 6 is a model prediction curve based on a bayesian optimization and a random search method, curve a represents a prediction curve based on a random optimization method, curve B represents a prediction curve based on a bayesian optimization method, and curve C represents a true value. It can be seen that the predicted curve based on the bayesian optimization method is closer to the true value.
In this example, in order to further verify the feasibility of bayesian optimization, experiments are set for the phenomenon that the performance of the overall model is affected by different wavelet decomposition layers, 10 models are constructed in the experiments, the models use the hyper-parameters determined by bayesian optimization in table 2, each model is only different in the selection of the wavelet decomposition layers, and the specific experiments are set as follows:
(1) Model 1: performing a layer 1 wavelet decomposition and training 2 GRUs for A1 and D1, respectively;
(2) Model 2: performing A2-layer wavelet decomposition and training 3 GRUs for A2, D1 and D2, respectively;
(3) Model 3: performing 3-layer wavelet decomposition and training 4 GRUs for A3, D1-D3, respectively;
(4) Model 4: performing 4-layer wavelet decomposition and training 5 GRUs for A4, D1-D4, respectively;
(5) Model 5: performing 5-layer wavelet decomposition and training 6 GRUs for A5, D1-D5, respectively;
(6) Model 6: performing 6-layer wavelet decomposition and training 7 GRUs for A6, D1-D6, respectively;
(7) Model 7: performing 7-layer wavelet decomposition and training 8 GRUs for A7, D1-D7, respectively;
(8) Model 8: performing 8-layer wavelet decomposition and training 9 GRUs for A8, D1-D8 respectively;
(9) Model 9: performing 9-layer wavelet decomposition and training 10 GRUs for A9, D1-D9 respectively;
(10) Model 10: 10 layers of wavelet decomposition were performed and 11 GRUs were trained for A10, D1-D10, respectively.
Table 4 gives 5 evaluation indexes of the corresponding model, model 8 being a model using bayesian optimization parameters entirely. According to the results, as the number of wavelet decomposition layers increases, five indexes generally show an optimization trend, and when the number of decomposition layers reaches 6, the change of the indexes tends to be stable, and the RMSE value is 48.5712 mu g/m 3 Reduced to 22.0185 mug/m 3 . Model 8 has the best of the two indices MAE and NRMSE, and the remaining 3 indices are also very close to the best of the 10 models, differing by only 0.0132 the most, so it is best to comprehensively consider the performance of model 8, which verifies the effect of the Bayesian optimization method.
The experimental results show that: the method for performing super-parameter optimization by using the Bayesian optimization algorithm is feasible in the mixed deep learning model, and compared with the traditional super-parameter optimization method such as random optimization, the method for training the model by using the Bayesian optimization method can enable the performance of the model to reach a better level.
TABLE 1 Bayesian optimization of hyper-parameter space
TABLE 2 Bayesian optimization and random search for determining optimal superparameters
TABLE 3 model Performance based on Bayesian optimization and random search methods
TABLE 4 predictive performance analysis of different wavelet decomposition levels
TABLE 5 prediction index of six models under the same test set
Example 2
A bayesian optimization algorithm was implemented and feasibility verified in example 1, in which the advantage of the proposed model of the present embodiment (WD-GRU) in terms of accuracy was demonstrated by comparison with other models.
First, data used in the experiment will be described, and the data set and test set used in the experiment are the same as those in example 1, and the prediction period is set to 24 steps.
In this example, a comparison is made with five combined models that also include time series data decomposition and depth networking. The combination models used included Composition-ARIMA-GRU-GRU, EMD_RNN (EMD and RNN combination), EMDCNN_GRU (EMD, CNN and GRU combination), WD-RNN (wavelet decomposition and RNN combination), WD-LSTM (wavelet decomposition and LSTM combination) and WD-GRU (wavelet decomposition and GRU combination) as proposed in this example.
FIG. 7 shows the predicted results of these six models, curves 1 to 6 represent the Composition-ARIMA-GRU-GRU model, EMD-RNN model, EMDCNN-GRU model, WD-RNN model, WD-LSTM model, and WD-GRU model of the present embodiment, respectively, and curve 7 represents the true curve, and it can be seen that the trends of these six curves are substantially the same as those of the true curve, and that the WD-GRU model is closest to the true curve. Five evaluation indices for the six models are given in table 5, with the red value in the table being the optimum value for each index. The evaluation indexes of the WD-GRU model provided by the invention are all optimal values, wherein the RMSE reaches 21.7300 mug/m 3 This is already a very low level. Fig. 8 and 9 show further details of 5 indexes, which are respectively improved by 38.3%,31.5%,51.4%,9.8% and 17.9% on five indexes RMSE, MAE, NRMSE, SMAPE and R respectively compared with the emdcnn_gru model with better prediction performance, and it is obvious that the WD-GRU model achieves a great improvement in accuracy.
The advantages of the model of this example are then further analyzed by comparison. First we found that the wavelet decomposition method works well in the study of complex sequences of PM 2.5. In the mixed model, the PM2.5 sequence is decomposed before prediction, typically using wavelet decomposition, empirical mode decomposition, and seasonal trend decomposition method of loess to reduce complexity of the PM2.5 sequence, and then prediction is performed using RNN or GRU network. Among the three hybrid models of WD-RNN, EMD-RNN, decomposition-ARIMA-GRU-GRU in Table 5, the WD-RNN model is in the leading position in all indexes. Although both models used the same RNN network as the secondary predictor, the wavelet Decomposition based WD-RNN model increased RMSE by 36.0%, MAE by 36.1%, NRMSE by 34.6%, SMAPE by 23.9%, R by 25.0% and the Decomposition-ARIMA-GRU-GRU model used a better performing GRU network but had poorer overall predictive performance than the higher scoring EMD-RNN model.
Second, the GRU model is chosen as a predictor of the wavelet decomposition components, which may allow for better model performance. The structure of WD-RNN, WD-LSTM and WD-GRU model in Table 5 differ only in the choice of sub-predictor, but it can be seen that the WD-GRU model with GRU network as sub-predictor has obvious advantages in terms of each index. Compared with WD-LSTM model, the root mean square error of the proposed model is reduced by 4.7035 mug/m 3 The pearson correlation coefficient increases from 0.8932 to 0.9276. Also, in the EMD-RNN and EMDCNN_GRU models, the model performance using the GRU network is better.
Finally, the performance of the model of this embodiment leads not only the wavelet decomposition and the kudo of the GRU, but also the bayesian optimization algorithm to determine the hyper-parameters of the model, which can maximize the performance of the model, are important parts. According to the data in Table 5, the WD-LSTM model does not use Bayesian optimization algorithm to determine the hyper-parameters, its NRMSE is 0.0935, but the NRMSE of the WD-GRU model is 0.0682, which is very difficult to raise by only the GRU network, and the raising of the WD-GRU model is also very large in other indexes.
Analysis of results: the model training process is reasonable, the performance of the model can be greatly exerted by the super parameters determined by Bayesian optimization, and the effect of wavelet decomposition on data analysis of complex sequences is good. Secondly, the model has better prediction accuracy than a common mixed predictor on the prediction task of PM2.5 concentration in the long-term atmosphere with the period of 24 hours, which verifies the use value of the WD-GRU model in the common time sequence prediction.
Example 2
Corresponding to embodiment 1, this embodiment proposes a timing prediction apparatus based on bayesian optimization and wavelet decomposition, the apparatus including a processor configured with operation instructions executable by the processor to perform the following operations:
optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;
building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;
and obtaining a prediction result according to the training result.
Specifically, the working principle and the calculation steps of the device according to the present embodiment may refer to the contents described in embodiment 1, and are not described herein. In the embodiment, the wavelet decomposition reduces the complexity of the complex sequence, then the GRU network is used for predicting the decomposition result, and finally the prediction result is obtained through fusion. The accuracy of prediction can be effectively improved, the super-parameters are optimized by using a Bayesian optimization algorithm, and the method has high accuracy in a long-term time sequence prediction task.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (8)

1. A time sequence prediction method based on bayesian optimization and wavelet decomposition, the method comprising:
optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;
building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;
obtaining a prediction result according to the training result;
the process for obtaining the acquired data and carrying out wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and the mother wavelet function in the wavelet decomposition to obtain a decomposition result comprises the following steps:
decomposing the acquired data into a low-frequency component and a high-frequency component according to the optimized parent wavelet function and the parent wavelet function corresponding to the parent wavelet function, wherein the decomposition layer number is determined according to the optimized wavelet decomposition layer number;
processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;
the high frequency component is processed by a high frequency filter to obtain a high frequency subsequence.
2. The method according to claim 1, wherein the process of optimizing the model super-parameters according to the bayesian optimization method to obtain the optimal super-parameters comprises:
defining a model hyper-parameter optimized objective function, wherein the model hyper-parameter optimized objective function obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model super-parameter optimized objective function;
carrying out Gaussian process treatment on the model hyper-parameter optimized objective function to obtain posterior probability of the model hyper-parameter optimized objective function;
and carrying out parameter updating on the Bayesian optimized objective function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain an optimal super parameter.
3. The method according to claim 1, wherein the building a GRU sub-predictor based on the decomposition result learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after the optimization to obtain the training result comprises:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition by the GRU subsequence predictor to obtain training results of the subsequences.
4. A method according to claim 3, wherein the process of obtaining a predicted outcome from the training outcome comprises:
and carrying out summation processing on the training results of the subsequences to obtain a prediction result.
5. A bayesian optimization and wavelet decomposition based timing prediction apparatus, the apparatus comprising a processor configured with processor-executable operating instructions to perform operations comprising:
optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;
building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;
obtaining a prediction result according to the training result;
the apparatus further includes a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to perform operations comprising:
decomposing the acquired data into a low-frequency component and a high-frequency component according to the optimized parent wavelet function and the parent wavelet function corresponding to the parent wavelet function, wherein the decomposition layer number is determined according to the optimized wavelet decomposition layer number;
the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;
the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.
6. The apparatus of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
defining a model hyper-parameter optimized objective function, wherein the model hyper-parameter optimized objective function obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model super-parameter optimized objective function;
carrying out Gaussian process treatment on the model hyper-parameter optimized objective function to obtain posterior probability of the model hyper-parameter optimized objective function;
and carrying out parameter updating on the Bayesian optimized objective function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain an optimal super parameter.
7. The apparatus of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition by the GRU subsequence predictor to obtain training results of the subsequences.
8. The apparatus of claim 7, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
and carrying out summation processing on the training results of the subsequences to obtain a prediction result.
CN202010659067.1A 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition Active CN111859264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010659067.1A CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010659067.1A CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Publications (2)

Publication Number Publication Date
CN111859264A CN111859264A (en) 2020-10-30
CN111859264B true CN111859264B (en) 2024-02-02

Family

ID=73152695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010659067.1A Active CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Country Status (1)

Country Link
CN (1) CN111859264B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651543A (en) * 2020-11-10 2021-04-13 沈阳工程学院 Daily electric quantity prediction method based on VMD decomposition and LSTM network
CN112288193A (en) * 2020-11-23 2021-01-29 国家海洋信息中心 Ocean station surface salinity prediction method based on GRU deep learning of attention mechanism
CN112434888A (en) * 2020-12-17 2021-03-02 中国计量大学上虞高等研究院有限公司 PM2.5 prediction method of bidirectional long and short term memory network based on deep learning
CN112749845A (en) * 2021-01-13 2021-05-04 中国工商银行股份有限公司 Model training method, resource data prediction method, device and computing equipment
CN113128132A (en) * 2021-05-18 2021-07-16 河南工业大学 Grain pile humidity and condensation prediction method based on depth time sequence
CN113360848A (en) * 2021-06-04 2021-09-07 北京工商大学 Time sequence data prediction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413494A (en) * 2019-06-19 2019-11-05 浙江工业大学 A kind of LightGBM method for diagnosing faults improving Bayes's optimization
WO2019229528A2 (en) * 2018-05-30 2019-12-05 Alexander Meyer Using machine learning to predict health conditions
CN111192453A (en) * 2019-12-30 2020-05-22 深圳市麦谷科技有限公司 Short-term traffic flow prediction method and system based on Bayesian optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019229528A2 (en) * 2018-05-30 2019-12-05 Alexander Meyer Using machine learning to predict health conditions
CN110413494A (en) * 2019-06-19 2019-11-05 浙江工业大学 A kind of LightGBM method for diagnosing faults improving Bayes's optimization
CN111192453A (en) * 2019-12-30 2020-05-22 深圳市麦谷科技有限公司 Short-term traffic flow prediction method and system based on Bayesian optimization

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Cheng Yong, et al.Hybrid algorithm for short-term forecasting for PM2.5 in china.Atmospheric Environment.2019,第200卷264-279. *
Fei wang, et al.Wavelet Decomposition and Convolutional LSTM Networks Based Improved Deep Learning Model for Solar Irradiance Forecasting.Applied sciences.2018,第8卷(第8期),全文. *
Xin Fu, et al.Short-Term Traffic Speed Prediction Method for Urban Road Sections Based on Wavelet Transform and Gated Recurrent Unit.Mathematical Problems in Engineering.2020,全文. *
Xue-Bo Jin, et al.Deep Hybrid Model Based on EMD with Classification by Frequency Characteristics for Long-Term Air Quality Prediction.Mathematics.2020,第8卷(第2期),全文. *
司阳 ; 肖秦琨 ; .基于长短时记忆和动态贝叶斯网络的序列预测.计算机技术与发展.2018,(第09期),全文. *
梁志坚 ; 唐云 ; .一种短期风电场输出功率概率预测方法.中国电力企业管理.2016,(第21期),全文. *
王欣冉 ; 邢永丽 ; 巨程晖 ; .小波包与贝叶斯LS-SVM在石油价格预测中的应用.统计与决策.2011,(第06期),全文. *
石婧文 ; 罗树添 ; 叶可江 ; 须成忠 ; .电商集群的流量预测与不确定性区间估计.集成技术.2019,(第03期),全文. *

Also Published As

Publication number Publication date
CN111859264A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111859264B (en) Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition
CN110212528B (en) Power distribution network measurement data missing reconstruction method
CN112434848B (en) Nonlinear weighted combination wind power prediction method based on deep belief network
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
CN113411216B (en) Network flow prediction method based on discrete wavelet transform and FA-ELM
CN110909928B (en) Energy load short-term prediction method and device, computer equipment and storage medium
CN114912077B (en) Sea wave forecasting method integrating random search and mixed decomposition error correction
CN111210089A (en) Stock price prediction method of gated cyclic unit neural network based on Kalman filtering
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN114596726A (en) Parking position prediction method based on interpretable space-time attention mechanism
CN112651499A (en) Structural model pruning method based on ant colony optimization algorithm and interlayer information
Xiao et al. Predict stock prices with ARIMA and LSTM
CN115240782A (en) Drug attribute prediction method, device, electronic device and storage medium
CN114065996A (en) Traffic flow prediction method based on variational self-coding learning
CN113298131A (en) Attention mechanism-based time sequence data missing value interpolation method
CN113095328A (en) Self-training-based semantic segmentation method guided by Gini index
CN116744455A (en) Spectrum prediction method and system based on wavelet-interpretable LSTM
Sun et al. A new decomposition-integrated air quality index prediction model
CN117648646B (en) Drilling and production cost prediction method based on feature selection and stacked heterogeneous integrated learning
CN114694379B (en) Traffic flow prediction method and system based on self-adaptive dynamic graph convolution
Kraamwinkel Time Series Forecasting on COVID-19 Data and Its Relevance to International Health Security
Wang et al. A Hybrid Wind Speed Prediction Model Based on Signal Decomposition and Deep 1DCNN
CN117547281A (en) Electrocardiosignal classification method and device, electronic equipment, medium and product
CN116303054A (en) High-performance AB tests sampling method based on prediction and combination optimization
Curin-Osorio Embracing uncertainty in fisheries stock assessment and management: a comparison of statistical methods for quantifying uncertainty, improving population model parameterization and evaluation of robust harvest strategies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant