CN114548591A

CN114548591A - Time sequence data prediction method and system based on hybrid deep learning model and Stacking

Info

Publication number: CN114548591A
Application number: CN202210198023.2A
Authority: CN
Inventors: 雷建军; 程旭; 邓磊
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chengdu Mike Network Technology Service Co.,Ltd.
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-05-27

Abstract

The invention belongs to the field of machine learning, and particularly relates to a time sequence data prediction method and a time sequence data prediction system based on a hybrid deep learning model and Stacking, wherein the method comprises the following steps: acquiring time sequence data to be predicted, and preprocessing the time sequence data; constructing a hybrid deep learning model, and optimizing hyper-parameters of the hybrid deep learning model by adopting a self-adaptive inertial weight particle swarm algorithm; inputting the preprocessed time sequence data into a trained mixed deep learning model to obtain the dependency relationship of the time sequence data; correcting the dependency relationship of the time sequence data by adopting a Stacking hierarchical model to obtain a corrected time sequence data prediction result; according to the time sequence data prediction result, a client executes corresponding operation; according to the invention, the time sequence data dependency relationship is mined by stacking the bidirectional gate cycle long and short term memory network, the bidirectional long and short term memory network and the bidirectional gate cycle unit so as to enrich the expression form of the model, and the final time sequence data prediction result is obtained.

Description

Time sequence data prediction method and system based on hybrid deep learning model and Stacking

Technical Field

The invention belongs to the field of machine learning, and particularly relates to a time sequence data prediction method and system based on a hybrid deep learning model and Stacking.

Background

In recent years, with the rapid development of the internet and information technology in china, large data has come. Among the growing data, time series data plays an important role in the fields of industry, finance, commerce and the like, because the time series data hides many potentially valuable information, and provides powerful support for important decisions of the industry. Behind the PM2.5 concentration prediction, the solar power generation prediction, the stock price prediction and the industrial equipment failure prediction, potential information mined by students from time series data is relied on. Therefore, the academia has been trying to use various machine learning, metering and economic model methods to explore and mine potential information in time series data, but with the increase of data volume and the increase of data dimension, the traditional model has limited expression capability, cannot extract useful features from a large amount of data, and cannot fully mine potential information in a large amount of information. With the development of deep learning, the method has a good effect in the fields of artificial intelligence, image recognition, natural language processing and the like, and provides a powerful tool for mining potential information in time sequence data.

The existing technical scheme generally adopts standard convolution to extract features, and the standard convolution can improve the receptive field but lose a large amount of information if pooling is carried out during feature extraction, and the receptive field is smaller if pooling is not carried out, so that better balance is not obtained between the enlargement of the receptive field and the guarantee that the information is not lost in a large amount; the existing technical scheme generally adopts a single LSTM model or a GRU model to mine the dependency relationship between time sequences, a bidirectional structure is not used, and the model has a limited characteristic expression form.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a time sequence data prediction method based on a hybrid deep learning model and Stacking, which comprises the following steps: acquiring time sequence data to be predicted, and preprocessing the time sequence data; constructing a hybrid deep learning model, wherein the model comprises an iterative expansion convolution neural network and a bidirectional long-short term memory neural network; optimizing the hyper-parameters of the hybrid deep learning model by adopting a self-adaptive inertial weight particle swarm algorithm, and constructing an optimal hybrid deep learning model according to the optimal hyper-parameters; inputting the preprocessed time sequence data into a trained mixed deep learning model to obtain the dependency relationship of the time sequence data; correcting the dependency relationship of the time sequence data by adopting a Stacking hierarchical model to obtain a corrected time sequence data prediction result; and predicting the result according to the time sequence data, and executing corresponding operation by the client.

Preferably, the process of preprocessing the time series data comprises: filling missing data of the time sequence data by adopting an up-down sampling linear interpolation method; coding the filled data by adopting a label coding method; carrying out normalization processing on the coded data; and performing feature selection on the normalized data to obtain preprocessed time sequence data.

Further, the characteristic selection of the normalized data comprises screening the data by adopting a Pearson correlation coefficient; the expression for the pearson correlation coefficient is:

wherein Cov (X, Y) represents the covariance of variables X and Y, X represents the vector of feature X, Y represents the vector of feature Y, σ represents the covariance of feature X, Y represents the covariance of feature Y, and_Xdenotes the standard deviation, σ, of the vector X_YDenotes the standard deviation of the vector Y, E [ (X-. mu.)_x)(Y-μ_Y)]Represents the covariance, μ, of the variables X and Y_xMeans, μ, of the characteristic X_YRepresents the mean of the feature Y.

Preferably, the building of the hybrid deep learning model comprises: the iterative expansion convolutional neural network consists of three layers of expansion convolutional layers, the expansion radius of each expansion convolutional layer is 1, 1 and 2 respectively, each layer comprises 64 convolutional kernels, and the size of each convolutional kernel is 2; the output of the last layer of the convolutional layer of each expansion convolutional layer is used as the output of the next layer of the expansion convolutional layer to form an iterative expansion convolutional neural network; the bidirectional long and short term memory neural network comprises a bidirectional long and short term memory network and a bidirectional gate cycle unit, and the output of the bidirectional long and short term memory network is used as the input of the bidirectional gate cycle unit to form the bidirectional long and short term memory neural network.

Preferably, the step of searching the optimal hyper-parameter of the hybrid deep learning model by adopting the self-adaptive inertial weight particle swarm optimization comprises the following steps:

step 1: taking the convolution kernel size, the number of units of a hidden layer of a circulating network and the batch processing size of the convolution network in the hybrid deep learning model as optimization objects, initializing the speed and the position of each particle in a population, setting the optimal position Pbest searched by each particle at present as an initial position, and taking the optimal position searched by the particles in the overall situation as Gbest;

step 2: calculating a fitness value of each particle; the calculation formula of the fitness value is an average absolute error calculation formula of the prediction result of the hybrid deep learning model;

and step 3: determining a global optimal particle position Pbest and a local optimal position Gbest according to the particle fitness value;

and 4, step 4: updating the speed and the position of the particle by adopting a speed and position updating formula according to the optimal particle position Pbest and the local optimal position Gbest;

wherein, ω is_idThe coefficient of inertia is expressed as a function of,

represents the velocity before particle renewal, C₁And C₂All represent that the acceleration factor is non-negative constant, and xi and eta represent distribution in [0, 1 ]]Random number over interval，

Representing the optimal position before the particle update,

indicating the position before the ith particle update,

representing the local optimal position before particle updating, r representing a constraint factor, generally 1, k representing an updating turn, D representing a dimension, D representing the total dimension of the particles, namely the total number of the optimized objects, sigmoid representing a sigmoid function, alpha being a linear variation coefficient, v_idThe scalar magnitude of the speed, Δ h is the function value variation of the particle from one moment to another moment;

and 5: setting a termination condition of particle search, wherein the set termination condition of the particle search comprises the maximum iteration times of the algorithm or a deviation threshold value between two adjacent generations; judging whether the particle meets the search termination condition, if so, outputting an optimal hyper-parameter, and reconstructing the deep learning model through the optimal hyper-parameter; and if the ending condition of the particle search is not met, returning to the step 2.

Preferably, the process of training the hybrid deep learning model includes:

step 1: acquiring historical time sequence data, and constructing a two-dimensional matrix input feature and a one-dimensional matrix label according to the historical data; the two-dimensional matrix input features and the one-dimensional matrix labels are corresponded to obtain a training data set;

step 2: preprocessing data in the training data set;

and step 3: inputting the preprocessed data into an iterative expansion convolution neural network, and extracting the characteristics of each expansion convolution layer to obtain the local trend characteristics of the time sequence data; the initial characteristics extracted from each expansion convolutional layer in the iterative expansion convolutional neural network are used as the input of the next expansion convolutional layer, and the iteration is carried out for three times;

and 4, step 4: inputting the local trend characteristics into a bidirectional gate cycle long-short term memory neural network to predict the dependency relationship of time series data; inputting the local trend characteristics into a bidirectional long and short term memory network, and taking the output of the bidirectional long and short term memory network as the input of a bidirectional gate cycle unit to obtain a dependency relationship predicted value;

and 5: calculating an average absolute error according to the predicted value and the true value, and taking the average absolute error as a loss function of the hybrid deep learning model;

step 6: and optimizing a loss function by adopting an Adam optimizer, updating parameters of the hybrid deep learning model, and finishing the training of the hybrid deep learning model when the loss is minimum.

Further, the calculation formula of the average absolute error is as follows:

wherein MAE represents the mean absolute error, n represents the number of samples,

indicates the predicted value, y_iRepresenting the true value.

Preferably, the process of correcting the dependency relationship of the time series data by using the Stacking hierarchical model includes: the Stacking layered model is of a two-layer structure, a random forest RF, a gradient lifting tree GBDT and an XGboost model are used as an initial learner and are arranged on a first layer, and a linear regression model LR is used as a secondary learner and is arranged on a second layer;

the process of processing the dependency relationship of the time series data by the beginner comprises the following steps: the time sequence data dependency relationship output by the hybrid deep learning model is used as input, and the real values of the time sequence data are used as labels and are respectively input into the random forest RF, the gradient lifting tree GBDT and the XGboost model to obtain three initial correction results;

the process of the secondary learner for processing the initial correction result comprises the following steps: and performing characteristic combination on the three initial correction results, training a linear regression model LR by using the combined data as input data and the true value as a label to obtain distribution weights of the random forest RF, the gradient lifting tree GBDT and the XGboost model, and fusing the three initial correction results according to the distribution weights to obtain a corrected time sequence data prediction result.

A hybrid deep learning model and Stacking based time series data prediction system, the system comprising: the system comprises a data acquisition module, a data preprocessing module, an iterative expansion convolution neural network module, a bidirectional long-short term memory neural network module, a self-adaptive inertia weight particle swarm module, a data correction module and a data output module;

the data acquisition module is used for acquiring time sequence data and inputting the time sequence data into the data preprocessing module;

the data preprocessing module is used for preprocessing time sequence data, and the preprocessing comprises missing data filling of the time sequence data, coding of the filled data, normalization processing of the coded data and feature selection processing of the normalized data;

the iterative expansion convolution neural network module is used for carrying out prediction processing on the preprocessed data to obtain the local trend characteristics of the time sequence data;

the bidirectional long and short term memory neural network module is used for processing the extracted local trend characteristics to obtain the dependency relationship of time sequence data;

the self-adaptive inertia weight particle swarm module is used for optimizing the super parameters of the hybrid deep learning model to obtain the super parameters of the optimal hybrid deep learning model;

the data correction module is used for correcting the dependency relationship of the time sequence data to obtain a corrected time sequence data prediction result;

and the data output module is used for outputting the corrected time sequence data prediction result.

To achieve the above object, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements any one of the above time series data prediction methods based on a hybrid deep learning model and Stacking.

In order to achieve the above object, the present invention further provides a time series data prediction apparatus based on a hybrid deep learning model and Stacking, which includes a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and used for executing a computer program stored in the memory so as to enable the hybrid deep learning model and Stacking-based time series data prediction device to execute any one of the hybrid deep learning model and Stacking-based time series data prediction methods.

The invention has the beneficial effects that:

the method comprises the steps of extracting local trend characteristics of time sequence data through an iterative expansion convolution neural network (ID-CNN), expanding a receptive field and reducing information loss, then constructing a bidirectional gate cycle long and short term memory network (Bi-GRLSTM), stacking the bidirectional long and short term memory network and the bidirectional gate cycle unit to mine the time sequence data dependency relationship to enrich the expression form of the model, optimizing the hyper-parameters of the mixed deep learning model by using a self-adaptive inertial weight particle swarm algorithm, training the model by using the optimized hyper-parameters and predicting to obtain a final prediction result of the time sequence data. The invention provides a method for correcting a time sequence data prediction result based on Stacking, which corrects the time sequence prediction result by integrating a Random Forest (RF), a gradient lifting tree (GBDT) and an XGboost model through a Stacking layered model integrated framework, so that the model prediction precision is higher.

Drawings

FIG. 1 is a diagram of an iterative dilation convolutional neural network of the present invention;

FIG. 2 is a diagram of a recurrent neural network of the present invention;

FIG. 3 is a block diagram of a long short term memory network of the present invention;

FIG. 4 is a block diagram of a door cycle unit of the present invention;

FIG. 5 is a diagram of a Stacking model architecture of the present invention;

FIG. 6 is a diagram of a bidirectional gate cycle long short term memory network according to the present invention;

FIG. 7 is a block diagram of a hybrid deep learning model of the present invention;

FIG. 8 is a flow chart of the adaptive inertial particle swarm algorithm of the present invention;

FIG. 9 is a flow path of the Stacking model of the present invention for correcting the predicted result;

FIG. 10 is a diagram of a hybrid deep learning and Stacking based time series data prediction model architecture according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A time series data prediction method based on a hybrid deep learning model and Stacking comprises the following steps: the method comprises the steps of firstly extracting local trend characteristics among time sequence data through an iterative expansion convolution neural network, expanding a receptive field under the condition of losing a large amount of information without pooling, then extracting the dependency relationship among the time sequence data by using a long-short term memory network and a gate cycle unit, and providing more information in a bidirectional structure. And finally, taking the prediction result of the recurrent neural network as a characteristic, and correcting the time sequence prediction result by using a Stacking hierarchical model integrated framework integrated Random Forest (RF), gradient lifting tree (GBDT) and XGboost model to improve the prediction precision.

A time sequence data prediction method based on a hybrid deep learning model and Stacking comprises the steps of obtaining time sequence data to be predicted and preprocessing the time sequence data; constructing a hybrid deep learning model, wherein the model comprises an iterative expansion convolution neural network and a bidirectional long-short term memory neural network; inputting the preprocessed time sequence data into a trained mixed deep learning model to obtain the dependency relationship of the time sequence data; correcting the dependency relationship of the time sequence data by adopting a Stacking hierarchical model to obtain a corrected time sequence data prediction result; and predicting the result according to the time sequence data, and executing corresponding operation by the client.

In standard convolution, the length of the model for time is limited by the size of the convolution kernel, and if longer dependencies are to be seized, many layers are required to be stacked linearly. To solve this problem, researchers have proposed a dilated convolution. The expansion convolution is to inject holes on the basis of the convolution map of the standard convolution so as to increase the receptive field. Therefore, the dilation convolution is based on the standard convolution by a super-parameter, called dilation rate, which refers to the number of intervals of the convolution kernel. Unlike standard convolution, the dilated convolution allows the input to be sampled at intervals that are controlled by the dilation rate. The dilation convolution causes the size of the effective window to grow exponentially with the number of layers. Thus, the convolution network can obtain a large reception field by using a small number of layers. However, simply increasing the depth of the stack dilation convolution will result in a model overfitting. For this purpose, the updated scaled CNN (ID-CNN) is used to extract the local tendency feature. The architecture of the ID-CNN and the specific structure of its internal inflation block are shown in FIG. 1. ID-CNN the same structural CNN block is applied repeatedly a number of times, each iteration taking as input the result of the last application. Inside each expansion block are a plurality of expansion convolution layers with increasing expansion width. Reusing the dilation convolution layer with increasing dilation width in an iterative manner provides a wider range of local trend feature extractions. The structure of the iterative dilation convolution is shown in fig. 1.

The Recurrent Neural Network (RNN) is a recurrent neural network in which sequence data is input, recursion is performed in the evolution direction of the sequence, and all recurrent unit nodes are connected in a chain manner. The recurrent neural network has memorability, shared parameters and complete graphic, so that the recurrent neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has applications in the fields of natural language processing, such as speech recognition, language modeling, machine translation, and the like, and is also used for various types of time series prediction. The circular neural network constructed by introducing the convolutional neural network can process computer problems containing sequence input. The internal structure of RNN is shown in FIG. 2.

The long-short term memory network (LSTM) is a special Recurrent Neural Network (RNN), is specially designed for solving the long-term dependence problem of the general recurrent neural network, and mainly solves the problems of gradient loss and gradient explosion in the long sequence training process. In short, LSTM can perform better in longer sequences than normal RNNs. The LSTM controls the transmission state through the gating state, remembers that the LSTM needs to memorize for a long time and forgets unimportant information; unlike ordinary RNNs, which have only one memory overlay. The internal structure of the LSTM is shown in fig. 3.

The gate cycle unit (GRU) is a very powerful variant of the long short term memory network (LSTM) which is simpler and more powerful than LSTM networks and is therefore also a very manifold network of today. GRUs, since they are variants of LSTM, can also address long dependence in RNN networks. Three gate functions are introduced in the LSTM to control input, memory and output values for the input, forgetting and output gates, respectively. While in the GRU model there are only two gates, the update gate and the reset gate respectively. The GRU is less than the LSTM by one gate function, and therefore less than the LSTM in number of parameters, so that the GRU as a whole trains faster than the LSTM. The internal structure of the GRU is shown in fig. 4.

The particle swarm optimization algorithm is an evolutionary computing technology and is derived from the behavior research of bird swarm predation. The basic idea of the particle swarm optimization algorithm is to search an optimal solution through cooperation and information sharing among individuals in a swarm, and simulate birds in the swarm by designing a particle without mass, wherein the particle only has two attributes, namely speed and position, the speed represents the moving speed, and the position represents the moving direction. And each particle independently searches an optimal solution in a search space, records the optimal solution as a current individual extremum, shares the individual extremum with other particles in the whole particle swarm, finds the optimal individual extremum as a current global optimal solution of the whole particle swarm, and adjusts the speed and the position of each particle in the particle swarm according to the found current individual extremum and the current global optimal solution shared by the whole particle swarm. The Stacking is a layered model integration framework. Taking two layers as an example, the first layer is composed of a plurality of base learners, the input of the base learners is an original training set, and the model of the second layer is added into the training set for retraining by taking the output of the base learners of the first layer as a characteristic, so that a complete Stacking model is obtained. The Stacking can integrate the prediction results of a plurality of models together, and the prediction accuracy of the models can be improved in a small range after the models are fused. The Stacking structure is shown in FIG. 5.

In one embodiment of the present invention, the time-series data is PM2.5 time-series data; meteorological data includes PM2.5 concentration, dew point, temperature, pressure, combined wind direction, cumulative wind speed, cumulative snowfall hours. When the mixed deep learning preliminary prediction is used, 60% of data is selected as a training set, 20% of data is selected as a verification set, and 20% of data is selected as a test set.

The process of preprocessing the time sequence data comprises the following steps: filling missing data of the time sequence data by adopting an up-down sampling linear interpolation method; coding the filled data by adopting a label coding method; carrying out normalization processing on the coded data; and performing feature selection on the normalized data to obtain preprocessed time sequence data.

Missing value processing: the PM2.5 numerical value in the used data set has a large number of vacancy values, the vacancy values need to be filled, and the vacancy values are filled according to the values of data before and after the vacancy values by adopting an up-down sampling linear interpolation method, wherein the specific formula is as follows:

wherein, y_iRepresenting filled values, c an equal value, i a distance from the first non-empty value preceding, y_bIndicating the first non-empty value after the empty value, y_aRepresenting the first non-empty value before the empty value, b representing the position of the first non-empty value after the empty value, a representing the position of the first non-empty value before the empty value.

Non-numerical data processing: in the used data, the numerical value of the combined wind direction characteristic is expressed by characters, and the model cannot be input, so that the combined wind direction is discretized by adopting label coding, and the characters are converted into integer numerical values. For example, all wind directions are encoded with four values "SE", "cv", "NW", and "NE", and

integers

1, 2, 3, and 4 are used instead of the four wind direction values, respectively.

Data normalization processing: the range difference of different attribute values is large, if the model is directly input, the influence of the characteristics with large numerical range on the prediction result is large, and the final prediction effect is influenced, so that all the characteristics are normalized, and all the characteristics are reduced to 0-1. The normalized formula is:

wherein, X_normRepresenting normalized data, X representing raw data, X_minRepresenting the minimum, X, of the original data set_maxRepresenting the maximum value of the original data set.

Selecting characteristics: the feature that affects the PM2.5 concentration has a large number of dimensions and a large data volume, and in order to reduce the data volume and remove the influence of noise interference on the prediction result of the feature that has a small influence on the PM2.5 concentration, it is necessary to select a feature. The Pearson correlation coefficient calculation formula is as follows:

The method for constructing the hybrid deep learning model comprises the following steps: the iterative expansion convolutional neural network consists of three layers of expansion convolutional layers, the expansion radius of each expansion convolutional layer is 1, 1 and 2 respectively, each layer comprises 64 convolutional kernels, and the size of each convolutional kernel is 2; the output of the last layer of each expansion convolution layer is used as the output of the next layer of expansion convolution layer to form an iterative expansion convolution neural network; the bidirectional long and short term memory neural network comprises a bidirectional long and short term memory network and a bidirectional gate cycle unit, and the output of the bidirectional long and short term memory network is used as the input of the bidirectional gate cycle unit to form the bidirectional long and short term memory neural network.

In the process of preliminarily predicting time sequence data by adopting a mixed deep learning model, historical PM2.5 data is input, local trend features are extracted through iterative dilation convolution firstly, and in order to enable the receptive field of dilation convolution to be larger and obtain wider feature information, iterative structures are adopted to extract features. The iteration structure is 3 partitioned blocks, each Block is a convolution layer with a partition rate of 1, 1 and 2, the convolution kernel size of each layer is 2, and the filters are 64.

After the features are extracted, inputting the local trend features extracted by iterative expansion convolution into a bidirectional long-short term memory neural network, then stacking bidirectional gate cycle Units behind a model, setting the number of Units to be 50, selecting the activation functions as sigmoid functions, mining the time sequence dependency relationship between special diagnoses through the bidirectional gate cycle long-short term memory network, and checking data from two directions by a bidirectional structure, so that the model can obtain richer expression forms and capture the modes which are possibly ignored by the unidirectional structure. The structure of the bidirectional gate cycle long and short term memory network is shown in fig. 6.

The process of training the hybrid deep learning model comprises the following steps:

step 2: preprocessing data in the training data set;

and 4, step 4: inputting the local trend characteristics into a bidirectional gate cycle long and short term memory neural network to predict the dependency relationship of time sequence data; inputting the local trend characteristics into a bidirectional long and short term memory network, and taking the output of the bidirectional long and short term memory network as the input of a bidirectional gate cycle unit to obtain a dependency relationship predicted value;

The structure of the mixed depth model is shown in fig. 7, a full-connection layer is superposed on the last layer of the model, the number of neurons of the full-connection layer corresponds to the number of future time points to be predicted, the model is evaluated by Mean Absolute Error (MAE) in the training process, an Adam optimizer is selected by the optimizer, the batch size is 128, the epochs are 100, and finally a preliminary prediction result is obtained. The average absolute error is calculated as:

indicates the predicted value, y_iRepresenting the true value.

As shown in fig. 8, the process of using the adaptive inertial weight particle swarm algorithm includes:

step 1: initializing the speed and the position of each particle in a population by taking the convolution kernel size of a convolution network, the number of units of a hidden layer of a circulation network and the batch processing size of the convolution network in a hybrid deep learning model as optimization objects, setting the optimal position Pbest searched by each particle at present as an initial position, and taking the optimal position searched by the particles in the overall situation as Gbest;

step 2: the fitness value for each particle was calculated according to the following formula. And constructing an LSTM model according to the corresponding parameters of each particle, training through training data, predicting through verification data, and taking the average absolute error of a prediction result as the fitness value of each particle.

indicates the predicted value, y_iRepresenting the true value.

and 4, step 4: adjusting the speed and position of the particles according to the following formula;

wherein, ω is_idThe coefficient of inertia is expressed as a function of,

represents the velocity before particle renewal, C₁And C₂Uniform meterShowing the acceleration factor as a non-negative constant, xi and eta are shown as being distributed over 0, 1]The random number over the interval is,

representing the optimal position before the particle update,

indicating the position before the ith particle update,

representing the local optimal position before particle updating, r representing a constraint factor, generally 1, k representing the updating round, D representing the dimension, D representing the total dimension of the particle, namely the total number of the optimized objects, sigmoid representing a sigmoid function, alpha representing a linear variation coefficient, v_idΔ h is the amount of change in the function value of the particle from one moment to another, being the scalar magnitude of the velocity.

Judging whether the particle meets the search termination condition or not comprises judging whether the iteration number of the particle reaches the maximum iteration number or not, if so, determining that the particle meets the search termination condition, otherwise, determining that the particle does not meet the search termination condition; and calculating the deviation between two adjacent generations, comparing the deviation with a set deviation threshold value, if the deviation is smaller than the set deviation threshold value, meeting the search termination condition, and otherwise, not meeting the search termination condition.

The process of correcting the dependency relationship of the time sequence data by adopting the Stacking hierarchical model comprises the following steps: the output of the mixed depth model prediction is used as a new characteristic, an actual value is used as a predicted value and is respectively input into a Random Forest (RF), a gradient lifting tree (GBDT) and an XGboost model, a 5-fold cross validation mode is adopted, training data is divided into 5 folds, 80% of data is used as a training set each time, 20% of data is used as a test set, an average absolute error (MAE) is used as a model evaluation standard and is respectively trained for 5 times, 1 part of prediction is obtained after each model prediction, and finally the 5 parts of prediction results are averaged and are taken into the next layer.

In order to combine the prediction results of the initial learner together in an optimal form, weights need to be assigned thereto. Random Forest (RF), gradient lifting tree (GBDT) and XGboost models are used as primary learners and are placed on a first layer, and models of a second layer generally adopt simpler models in order to prevent overfitting, so that a linear regression model (LR) is used as a secondary learner on the second layer. Combining the outputs of the initial learners together as features, fitting a linear regression model by using the real PM2.5 concentration of the initial data as a sample label, constructing a Stacking model, and finally obtaining a final prediction result obtained after the prediction of the mixed deep learning model is corrected through the prediction of the Stacking model, wherein the final prediction result is shown in fig. 9 and fig. 10.

The system embodiments of the present invention are the same as the method embodiments.

In an embodiment of the present invention, the invention further includes a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the above methods for predicting time series data based on a hybrid deep learning model and Stacking.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

A time sequence data prediction device based on a hybrid deep learning model and Stacking comprises a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and used for executing a computer program stored in the memory so as to enable the hybrid deep learning model and Stacking-based time series data prediction device to execute any one of the hybrid deep learning model and Stacking-based time series data prediction methods.

Specifically, the memory includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

Preferably, the Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A time series data prediction method based on a hybrid deep learning model and Stacking is characterized by comprising the following steps: acquiring time sequence data to be predicted, and preprocessing the time sequence data; constructing a hybrid deep learning model, wherein the model comprises an iterative expansion convolution neural network and a bidirectional long-short term memory neural network; optimizing the hyper-parameters of the hybrid deep learning model by adopting a self-adaptive inertial weight particle swarm algorithm, and constructing an optimal hybrid deep learning model according to the optimal hyper-parameters; inputting the preprocessed time sequence data into a trained mixed deep learning model to obtain the dependency relationship of the time sequence data; correcting the dependency relationship of the time sequence data by adopting a Stacking hierarchical model to obtain a corrected time sequence data prediction result; and predicting the result according to the time sequence data, and executing corresponding operation by the client.

2. The time series data prediction method based on the hybrid deep learning model and the Stacking according to claim 1, wherein the process of preprocessing the time series data comprises: filling missing data of the time sequence data by adopting an up-down sampling linear interpolation method; coding the filled data by adopting a label coding method; carrying out normalization processing on the coded data; and performing feature selection on the normalized data to obtain preprocessed time sequence data.

3. The hybrid deep learning model and Stacking-based time series data prediction method according to claim 2, wherein the feature selection of the normalized data comprises screening the data by using a Pearson correlation coefficient; the expression for the pearson correlation coefficient is:

4. The method for predicting time series data based on the hybrid deep learning model and Stacking according to claim 1, wherein the step of finding the optimal hyper-parameter of the hybrid deep learning model by adopting a self-adaptive inertial weight particle swarm algorithm comprises the following steps:

wherein, ω is_idThe coefficient of inertia is expressed as a function of,

represents the velocity before particle renewal, C₁And C₂Both represent acceleration factors, and both ξ and η represent distributions in [0, 1 ]]The random number over the interval is,

representing the optimal position before the particle update,

indicating the position before the ith particle update,

representing the local optimal position before particle updating, r representing a constraint factor, k representing an updating turn, D representing a dimension, D representing the total dimension of the particles, namely the total number of the optimized objects, sigmoid representing a sigmoid function, alpha being a linear variation coefficient, v_idThe scalar magnitude of the speed, Δ h is the function value variation of the particle from one moment to another moment;

5. The method for predicting time series data based on the hybrid deep learning model and Stacking according to claim 1, wherein the training process of the hybrid deep learning model comprises:

step 1: acquiring historical time sequence data, and constructing a two-dimensional matrix input characteristic and a one-dimensional matrix label according to the historical data; the two-dimensional matrix input features and the one-dimensional matrix labels are corresponded to obtain a training data set;

step 2: preprocessing data in the training data set;

and step 3: inputting the preprocessed data into an iterative expansion convolution neural network, and obtaining the local trend characteristics of the time sequence data through the characteristic extraction of each expansion convolution layer; the initial characteristics extracted from each expansion convolution layer in the iterative expansion convolution neural network are used as the input of the next expansion convolution layer, and the iteration is carried out for three times;

6. The time series data prediction method based on the hybrid deep learning model and the Stacking according to claim 5, wherein a calculation formula of an average absolute error is as follows:

indicates the predicted value, y_iRepresenting the true value.

7. The method for predicting the time series data based on the hybrid deep learning model and the Stacking according to claim 1, wherein the process of correcting the dependency relationship of the time series data by adopting the Stacking hierarchical model comprises the following steps: the Stacking layered model is of a two-layer structure, a random forest RF, a gradient lifting tree GBDT and an XGboost model are used as an initial learner and are arranged on a first layer, and a linear regression model LR is used as a secondary learner and is arranged on a second layer;

8. A time series data prediction system based on a hybrid deep learning model and Stacking is characterized by comprising: the system comprises a data acquisition module, a data preprocessing module, an iterative expansion convolution neural network module, a bidirectional long-short term memory neural network module, a self-adaptive inertia weight particle swarm module, a data correction module and a data output module;

9. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor to implement the method of any one of claims 1 to 7 for predicting time series data based on a hybrid deep learning model and Stacking.

10. A time series data prediction device based on a hybrid deep learning model and a Stacking is characterized by comprising a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and used for executing the computer program stored in the memory so as to enable the hybrid deep learning model and Stacking-based time series data prediction device to execute the time series data prediction method based on the hybrid deep learning model and Stacking in any one of claims 1 to 7.