CN111859264B

CN111859264B - Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Info

Publication number: CN111859264B
Application number: CN202010659067.1A
Authority: CN
Inventors: 金学波; 张家辉; 苏婷立; 白玉廷; 孔建磊
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2024-02-02
Anticipated expiration: 2040-07-09
Also published as: CN111859264A

Abstract

The invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps: optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor; acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result; building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result; and obtaining a prediction result according to the training result. The invention uses Bayesian optimization algorithm to optimize the super parameters, and has high accuracy in long-term time sequence prediction task.

Description

Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Technical Field

The present disclosure relates to the field of time sequence prediction, and in particular, to a time sequence prediction method and apparatus based on bayesian optimization and wavelet decomposition.

Background

Along with the continuous progress of industrialization and urbanization, information storage, sensor networks and computer technologies are rapidly developed, and technologies such as the internet play an important role in the life of people. The information is a time sequence which is generated continuously in turn according to the same time interval, such as the temperature of the weather monitoring station, the PM2.5 concentration of the atmosphere and the like, is simply recorded on historical events, and simultaneously, a lot of useful information is stored in the information, such as the temperature data of the weather monitoring station contains the temperature change rule of the weather monitoring station all the year round. Therefore, the time series are studied to extract the hidden information in the data, so that the change rule can be grasped and the future data can be predicted in advance.

Modeling historical time series data to predict data of a future period is a category of time series prediction, and research in the field of time series prediction has a certain foundation, and methods thereof can be roughly divided into two types. One is a traditional probability method, the traditional time prediction method is greatly limited by given data knowledge, and modeling conditions are severe, so that the effect of the method is not good; the other is a machine learning method, the method can design an algorithm for parameter learning according to task demands only by knowing historical data, modeling of a model is relatively easy, and the machine learning method is better in nonlinear prediction tasks.

The time sequence prediction method based on machine learning starts from a shallow neural network, but the shallow neural network cannot accurately model complex data due to the limitation of network depth, so that the shallow neural network can only be applied to short-term prediction and cannot perform accurate long-term prediction tasks. In order to improve the disadvantages of the shallow network, the structure of the network is gradually deepened, and the deep neural networks such as the cyclic neural network (Recurrent Neural Network, RNN) and the GRU become the main research direction of time sequence prediction. However, most of time series data are obtained from a real environment through research, so the data have strong volatility, randomness and complexity, and the prediction accuracy is difficult to ensure only by analyzing and learning the data through a deep neural network.

Disclosure of Invention

In order to solve one of the technical problems, the invention provides a time sequence prediction method and device based on Bayesian optimization and wavelet decomposition.

The first aspect of the embodiment of the invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps:

optimizing the model super-parameters according to a Bayes optimization method to obtain optimal super-parameters, wherein the model super-parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super-parameters of a GRU sub-predictor;

acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result;

building a GRU sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result;

and obtaining a prediction result according to the training result.

Preferably, the process of optimizing the model super-parameters according to the bayesian optimization method to obtain the optimal super-parameters includes:

defining a model hyper-parameter optimized objective function, wherein the model hyper-parameter optimized objective function obeys Gaussian distribution;

obtaining a Bayesian optimized objective function according to the model super-parameter optimized objective function;

carrying out Gaussian process treatment on the model hyper-parameter optimized objective function to obtain posterior probability of the model hyper-parameter optimized objective function;

and carrying out parameter updating on the Bayesian optimized objective function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain an optimal super parameter.

Preferably, the acquiring the acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers and a mother wavelet function in wavelet decomposition obtained after optimization to obtain a decomposition result includes:

decomposing the acquired data into a low-frequency component and a high-frequency component according to the optimized parent wavelet function and the parent wavelet function corresponding to the parent wavelet function, wherein the decomposition layer number is determined according to the optimized wavelet decomposition layer number;

processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;

the high frequency component is processed by a high frequency filter to obtain a high frequency subsequence.

Preferably, the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain the training result includes:

building a GRU sub-predictor based on a Keras Tensorflow framework;

and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition by the GRU subsequence predictor to obtain training results of the subsequences.

Preferably, the process of obtaining the prediction result according to the training result includes:

and carrying out summation processing on the training results of the subsequences to obtain a prediction result.

A second aspect of an embodiment of the present invention provides a timing prediction apparatus based on bayesian optimization and wavelet decomposition, the apparatus including a processor configured with operation instructions executable by the processor to perform operations of:

and obtaining a prediction result according to the training result.

Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:

Preferably, the apparatus further comprises a low frequency filter and a high frequency filter, the processor being configured with processor-executable operating instructions to perform operations of:

the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;

the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.

building a GRU sub-predictor based on a Keras Tensorflow framework;

The beneficial effects of the invention are as follows: in view of the characteristics of strong nonlinearity and strong randomness of the time sequence, the invention provides a mixed deep learning model combining a time sequence data decomposition method with a deep neural network. The complexity of the complex sequence is reduced through wavelet decomposition, then GRU network is used for predicting the result obtained through decomposition, and finally the prediction result is obtained through fusion. The invention can effectively improve the accuracy of prediction, uses the Bayesian optimization algorithm to optimize the super parameters, and has high accuracy in long-term time sequence prediction tasks.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a method of timing prediction based on Bayesian optimization and wavelet decomposition;

FIG. 2 is a schematic diagram of a wavelet decomposition process;

FIG. 3 is a schematic diagram of the result of wavelet decomposition, wherein the left region is a low frequency subsequence and the right region is a high frequency subsequence;

FIG. 4 is a schematic diagram of a GRU sub-predictor;

FIG. 5 is a schematic diagram of the overall structural framework of a WD-GRU hybrid model;

FIG. 6 is a schematic diagram of model predictive curve based on Bayesian optimization and stochastic search methods;

FIG. 7 is a schematic diagram showing the prediction results of PM2.5 per hour for Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM for Beijing every hour from 22/2016/3 to 9/2016;

FIG. 8 is a schematic diagram showing a comparison of RMSE and MAE details of the Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM;

FIG. 9 is a schematic diagram showing a comparison of details of NRMSE, SMAPE and R of Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN_GRU, WD-RNN, WD-LSTM.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is given with reference to the accompanying drawings, and it is apparent that the described embodiments are only some of the embodiments of the present application and not exhaustive of all the embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.

Example 1

As shown in fig. 1, the present embodiment proposes a time sequence prediction method based on bayesian optimization and wavelet decomposition, where the method includes:

and S101, optimizing the model super-parameters according to a Bayesian optimization method to obtain the optimal super-parameters.

Specifically, the super-parametric choice of the deep learning model directly determines the performance of the model. In this embodiment, one of the bayesian optimization methods is implemented by the python-based hyperopt library: optimization based on Sequence Model (SMBO).

In determining model parameters using bayesian optimization, an objective function and optimized hyper-parameter space need to be defined. Since the training process of deep learning is actually a black box, the Root Mean Square Error (RMSE) of the hybrid model is used as an objective function for model super-parametric optimization:

where m is the number of input samples, y _i (w) is a predicted value of the value,is an actual value.

The bayesian optimized objective function can be expressed as:

wherein w is ^* And (3) determining optimal parameters for Bayesian optimization, wherein W is an input set of super parameters, and W is a parameter space of the multidimensional super parameters.

The Bayesian optimization is divided into a Gaussian Process (GP) and a super-parameter selection process, and in the Gaussian process, when a set objective function g (w) obeys the following Gaussian distribution:

g(w)～GP(μ(w),O(w,w'))

where μ (w) is the mean of g (w), O (w, w ') is the covariance matrix of g (w), and initial O (w, w') can be expressed as:

in Bayesian optimization, the covariance matrix of the Gaussian process changes with the iterative process, assuming that the set of parameters input at step t+1 is w _t+1 The covariance matrix at this time can be expressed as:

wherein o= [ o (w _t+1 ,w ₁ ),o(w _t+1 ,w ₂ ),...,o(w _t+1 ,w _t )]The posterior probability of the objective function can be obtained at this time:

wherein θ is the observed data, μ _t+1 (w) is the mean value of the (t+1) th step g (w),is the variance of step g (w) at t+1.

After posterior probability is obtained, the optimal super-parameters are found through a super-parameter searching method based on the mean value and the variance of the posterior probability, and the super-parameter searching is completed through a UCB acquisition function in the embodiment:

wherein ζ _t+1 Is a constant, S (w|theta _t ) For UCB acquisition function, w _t+1 Is the super parameter of the selected t+1st step. The super parameters comprise the number of wavelet decomposition layers, a mother wavelet function in wavelet decomposition and super parameters of the GRU sub predictor.

S102, acquiring acquired data, and performing wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and a mother wavelet function in wavelet decomposition to obtain a decomposition result.

Specifically, in this embodiment, after acquiring the acquired data, wavelet decomposition is performed on the acquired data to reduce the complexity of the data.

When the wavelet decomposition is performed on the acquired data, a mother wavelet function is first selected, and the mother wavelet function can be directly obtained after optimization in S101:

each parent wavelet function has a corresponding parent wavelet function:

wherein k is a scaling factor, k ε R; k is not equal to 0, h is a translation coefficient, h e R, and t is a time index.

A complex sequence can be decomposed into a low frequency sub-sequence and a high frequency sub-sequence by a wavelet basis consisting of a parent wavelet function and a parent wavelet function:

wherein M (t) is the decomposed sequence, a _k,h Representing a low frequency component with a scaling factor k and a translation factor h, d _k,h And (3) representing a high-frequency component with a scaling coefficient of k and a translation coefficient of h, m representing the original sequence length, and n representing the wavelet decomposition layer number, wherein the wavelet decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization in S101. Then a is processed using a Low Pass Filter (LPF) and a High Pass Filter (HPF) _k,h And d _k,h To obtain a low frequency subsequence A _k,h And high frequency subsequence D _k,h . The wavelet decomposition process is illustrated in fig. 2, and the result of 8-layer wavelet decomposition of a PM2.5 sequence selection "db35" parent wavelet function is illustrated in fig. 3.

And S103, building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after optimization to obtain a training result.

Specifically, the GRU is a variant of the long and short term memory network (Long Short Term Memory, LSTM), which is simpler and more effective than the LSTM network, and has only an update gate and a reset gate in the GRU network structure. In this embodiment, each portion obtained by wavelet decomposition may be learned and predicted by using a two-layer GRU sub-predictor, and fig. 4 is a structure of the GRU sub-predictor in this embodiment. The super parameters of the GRU sub-predictor specifically further comprise: the number of neurons in the first layer of the GRU, the Dropout rate, the number of training times, the batch size, and the optimizer.

GRU algorithm pseudocode:

(1) Normalizing data set θ

(2) Model learning training data

Learn H based on θ

return H

The contents of the model training section can be realized based on the above-described processes of S101 to S103, namely: firstly, performing a super-parameter optimizing process, determining the optimal super-parameters of a training model according to a Bayesian optimizing method, and then performing model training by using the set of super-parameters; and secondly, when model training is carried out, the original sequence is decomposed based on wavelet decomposition to obtain corresponding low-frequency components and high-frequency components, and then each component is subjected to regular learning by using a GRU sub-predictor. Thus, the pseudo code of the Bayes optimization algorithm can be preliminarily obtained:

input: θ is the dataset, g (W) is the RMSE of the model, W is the hyper-parameter space (W ε W), H (w|θ) _i ) Is UCB acquisition function, T is the number of super parameters to be selected, and l is the number of sub-sequences of wavelet decomposition.

And (3) outputting: optimum super parameter w ^* 。

(1) Initializing, θ ^(l) ←InitSamples(g(w),θ,l)

(2)

(3) Modeling an objective function g (w), and calculating a posterior probability

(4) Using UCB acquisition functionThe row parameters are updated and,

(5) Using w ^* The super-parameters train the model provided by the invention to obtain the prediction y _i ←g(w ^* ) Calculate and update

(6)

(7)endfor

(8)

(9)return w ^*

S104, obtaining a prediction result according to the training result.

Specifically, in S103, after learning and predicting each portion obtained by wavelet decomposition by using a GRU sub-predictor, training results of each sub-sequence are obtained, and then the training results of each sub-sequence are summed to obtain a predicted result.

The method proposed in this embodiment is further described below by way of two specific examples.

Example 1

In the method proposed in this embodiment, a bayesian optimization method is used to determine the optimal super-parameters, and this example will demonstrate the use of the bayesian optimization method and verify the results.

First, the PM2.5 data set used in this experiment recorded the average PM2.5 concentration per hour for Beijing city between 2013 and 2017, totaling 37704 bars, in μg/m ³ . The model prediction period is set to 24 steps, that is, the model realizes the function of predicting the value of 24 hours in the future by using the historical data of 24 hours in the previous day, and fig. 5 shows the overall structure of the model of the embodiment. Testing Beijing air every year from day 3, month 22, 2016, month 4, month 9 of 2016Hours of PM2.5 content data.

Next, a super-parametric optimization process is performed, according to the above step S101, a super-parametric space and an objective function are required to be defined, table 1 shows the super-parametric space used in this example, where the space defines the number of wavelet decomposition layers, the mother wavelet function, the number of neurons in the first layer, the Dropout rate, the batch size, the training times, the range of the optimizer, and the optimization process selects RMSE of the hybrid model as the objective function. In this example, after 100 selections, bayesian optimization provides a set of optimal hyper-parameters. Table 2 gives the result of the bayesian optimization of the determined hyper-parameters, and for comparison, a set of hyper-parameters was also obtained using the conventional random search method.

The model training process is then performed, and the final results are shown in table 3, where Root Mean Square Error (RMSE), normalized mean square error (NRMSE), mean Absolute Error (MAE), symmetric Mean Absolute Percentage Error (SMAPE), and pearson correlation coefficient (R) are used as evaluation criteria for model performance. Compared with the traditional random optimization method, the model trained by the Bayesian optimization method has better performance, and the RMSE reaches 21.7300 mug/m ³ The R index is 0.9276, which is a good performance. Fig. 6 is a model prediction curve based on a bayesian optimization and a random search method, curve a represents a prediction curve based on a random optimization method, curve B represents a prediction curve based on a bayesian optimization method, and curve C represents a true value. It can be seen that the predicted curve based on the bayesian optimization method is closer to the true value.

In this example, in order to further verify the feasibility of bayesian optimization, experiments are set for the phenomenon that the performance of the overall model is affected by different wavelet decomposition layers, 10 models are constructed in the experiments, the models use the hyper-parameters determined by bayesian optimization in table 2, each model is only different in the selection of the wavelet decomposition layers, and the specific experiments are set as follows:

(1) Model 1: performing a layer 1 wavelet decomposition and training 2 GRUs for A1 and D1, respectively;

(2) Model 2: performing A2-layer wavelet decomposition and training 3 GRUs for A2, D1 and D2, respectively;

(3) Model 3: performing 3-layer wavelet decomposition and training 4 GRUs for A3, D1-D3, respectively;

(4) Model 4: performing 4-layer wavelet decomposition and training 5 GRUs for A4, D1-D4, respectively;

(5) Model 5: performing 5-layer wavelet decomposition and training 6 GRUs for A5, D1-D5, respectively;

(6) Model 6: performing 6-layer wavelet decomposition and training 7 GRUs for A6, D1-D6, respectively;

(7) Model 7: performing 7-layer wavelet decomposition and training 8 GRUs for A7, D1-D7, respectively;

(8) Model 8: performing 8-layer wavelet decomposition and training 9 GRUs for A8, D1-D8 respectively;

(9) Model 9: performing 9-layer wavelet decomposition and training 10 GRUs for A9, D1-D9 respectively;

(10) Model 10: 10 layers of wavelet decomposition were performed and 11 GRUs were trained for A10, D1-D10, respectively.

Table 4 gives 5 evaluation indexes of the corresponding model, model 8 being a model using bayesian optimization parameters entirely. According to the results, as the number of wavelet decomposition layers increases, five indexes generally show an optimization trend, and when the number of decomposition layers reaches 6, the change of the indexes tends to be stable, and the RMSE value is 48.5712 mu g/m ³ Reduced to 22.0185 mug/m ³ . Model 8 has the best of the two indices MAE and NRMSE, and the remaining 3 indices are also very close to the best of the 10 models, differing by only 0.0132 the most, so it is best to comprehensively consider the performance of model 8, which verifies the effect of the Bayesian optimization method.

The experimental results show that: the method for performing super-parameter optimization by using the Bayesian optimization algorithm is feasible in the mixed deep learning model, and compared with the traditional super-parameter optimization method such as random optimization, the method for training the model by using the Bayesian optimization method can enable the performance of the model to reach a better level.

TABLE 1 Bayesian optimization of hyper-parameter space

TABLE 2 Bayesian optimization and random search for determining optimal superparameters

TABLE 3 model Performance based on Bayesian optimization and random search methods

TABLE 4 predictive performance analysis of different wavelet decomposition levels

TABLE 5 prediction index of six models under the same test set

Example 2

A bayesian optimization algorithm was implemented and feasibility verified in example 1, in which the advantage of the proposed model of the present embodiment (WD-GRU) in terms of accuracy was demonstrated by comparison with other models.

First, data used in the experiment will be described, and the data set and test set used in the experiment are the same as those in example 1, and the prediction period is set to 24 steps.

In this example, a comparison is made with five combined models that also include time series data decomposition and depth networking. The combination models used included Composition-ARIMA-GRU-GRU, EMD_RNN (EMD and RNN combination), EMDCNN_GRU (EMD, CNN and GRU combination), WD-RNN (wavelet decomposition and RNN combination), WD-LSTM (wavelet decomposition and LSTM combination) and WD-GRU (wavelet decomposition and GRU combination) as proposed in this example.

FIG. 7 shows the predicted results of these six models, curves 1 to 6 represent the Composition-ARIMA-GRU-GRU model, EMD-RNN model, EMDCNN-GRU model, WD-RNN model, WD-LSTM model, and WD-GRU model of the present embodiment, respectively, and curve 7 represents the true curve, and it can be seen that the trends of these six curves are substantially the same as those of the true curve, and that the WD-GRU model is closest to the true curve. Five evaluation indices for the six models are given in table 5, with the red value in the table being the optimum value for each index. The evaluation indexes of the WD-GRU model provided by the invention are all optimal values, wherein the RMSE reaches 21.7300 mug/m ³ This is already a very low level. Fig. 8 and 9 show further details of 5 indexes, which are respectively improved by 38.3%,31.5%,51.4%,9.8% and 17.9% on five indexes RMSE, MAE, NRMSE, SMAPE and R respectively compared with the emdcnn_gru model with better prediction performance, and it is obvious that the WD-GRU model achieves a great improvement in accuracy.

The advantages of the model of this example are then further analyzed by comparison. First we found that the wavelet decomposition method works well in the study of complex sequences of PM 2.5. In the mixed model, the PM2.5 sequence is decomposed before prediction, typically using wavelet decomposition, empirical mode decomposition, and seasonal trend decomposition method of loess to reduce complexity of the PM2.5 sequence, and then prediction is performed using RNN or GRU network. Among the three hybrid models of WD-RNN, EMD-RNN, decomposition-ARIMA-GRU-GRU in Table 5, the WD-RNN model is in the leading position in all indexes. Although both models used the same RNN network as the secondary predictor, the wavelet Decomposition based WD-RNN model increased RMSE by 36.0%, MAE by 36.1%, NRMSE by 34.6%, SMAPE by 23.9%, R by 25.0% and the Decomposition-ARIMA-GRU-GRU model used a better performing GRU network but had poorer overall predictive performance than the higher scoring EMD-RNN model.

Second, the GRU model is chosen as a predictor of the wavelet decomposition components, which may allow for better model performance. The structure of WD-RNN, WD-LSTM and WD-GRU model in Table 5 differ only in the choice of sub-predictor, but it can be seen that the WD-GRU model with GRU network as sub-predictor has obvious advantages in terms of each index. Compared with WD-LSTM model, the root mean square error of the proposed model is reduced by 4.7035 mug/m ³ The pearson correlation coefficient increases from 0.8932 to 0.9276. Also, in the EMD-RNN and EMDCNN_GRU models, the model performance using the GRU network is better.

Finally, the performance of the model of this embodiment leads not only the wavelet decomposition and the kudo of the GRU, but also the bayesian optimization algorithm to determine the hyper-parameters of the model, which can maximize the performance of the model, are important parts. According to the data in Table 5, the WD-LSTM model does not use Bayesian optimization algorithm to determine the hyper-parameters, its NRMSE is 0.0935, but the NRMSE of the WD-GRU model is 0.0682, which is very difficult to raise by only the GRU network, and the raising of the WD-GRU model is also very large in other indexes.

Analysis of results: the model training process is reasonable, the performance of the model can be greatly exerted by the super parameters determined by Bayesian optimization, and the effect of wavelet decomposition on data analysis of complex sequences is good. Secondly, the model has better prediction accuracy than a common mixed predictor on the prediction task of PM2.5 concentration in the long-term atmosphere with the period of 24 hours, which verifies the use value of the WD-GRU model in the common time sequence prediction.

Example 2

Corresponding to embodiment 1, this embodiment proposes a timing prediction apparatus based on bayesian optimization and wavelet decomposition, the apparatus including a processor configured with operation instructions executable by the processor to perform the following operations:

and obtaining a prediction result according to the training result.

Specifically, the working principle and the calculation steps of the device according to the present embodiment may refer to the contents described in embodiment 1, and are not described herein. In the embodiment, the wavelet decomposition reduces the complexity of the complex sequence, then the GRU network is used for predicting the decomposition result, and finally the prediction result is obtained through fusion. The accuracy of prediction can be effectively improved, the super-parameters are optimized by using a Bayesian optimization algorithm, and the method has high accuracy in a long-term time sequence prediction task.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A time sequence prediction method based on bayesian optimization and wavelet decomposition, the method comprising:

obtaining a prediction result according to the training result;

the process for obtaining the acquired data and carrying out wavelet decomposition on the acquired data according to the optimized wavelet decomposition layer number and the mother wavelet function in the wavelet decomposition to obtain a decomposition result comprises the following steps:

2. The method according to claim 1, wherein the process of optimizing the model super-parameters according to the bayesian optimization method to obtain the optimal super-parameters comprises:

3. The method according to claim 1, wherein the building a GRU sub-predictor based on the decomposition result learning and predicting the decomposition result according to the super-parameters of the GRU sub-predictor obtained after the optimization to obtain the training result comprises:

building a GRU sub-predictor based on a Keras Tensorflow framework;

4. A method according to claim 3, wherein the process of obtaining a predicted outcome from the training outcome comprises:

5. A bayesian optimization and wavelet decomposition based timing prediction apparatus, the apparatus comprising a processor configured with processor-executable operating instructions to perform operations comprising:

obtaining a prediction result according to the training result;

the apparatus further includes a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to perform operations comprising:

6. The apparatus of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:

7. The apparatus of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:

building a GRU sub-predictor based on a Keras Tensorflow framework;

8. The apparatus of claim 7, wherein the processor is configured with processor-executable operating instructions to perform operations comprising: