CN115660167A

CN115660167A - Short-term air quality prediction method based on sparrow search algorithm and decomposition error correction

Info

Publication number: CN115660167A
Application number: CN202211308007.0A
Authority: CN
Inventors: 车金星; 胡焜
Original assignee: Nanchang Institute of Technology
Current assignee: Nanchang Institute of Technology
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-01-31

Abstract

The invention provides a short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction, which comprises the following steps of: s1, decomposing data by using a CEEMDAN decomposition algorithm to obtain a plurality of IMFs; s2, predicting the IMFs by using the deep learning models, and processing prediction errors of the deep learning models to obtain the weight of each deep learning model; the number of the deep learning models is at least two; and S3, combining the prediction results of each deep learning model according to the weight to obtain a comprehensive prediction result, namely a multi-model combined prediction result. According to the method, the air quality prediction is combined with time series decomposition, a combined prediction method is adopted, and the prediction results of all models are combined by using the weight, so that the air quality prediction result with higher precision can be obtained.

Description

Short-term air quality prediction method based on sparrow search algorithm and decomposition error correction

Technical Field

The invention relates to the technical field of air pollution prediction, in particular to a short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction.

Background

Long term exposure of humans to contaminated air can result in health risks. According to the World Health Organization (WHO) research, the global burden of disease associated with exposure to air pollution has caused a tremendous loss of human health worldwide. It is estimated that the burden of disease caused by air pollution is comparable to other major global health risks such as unhealthy diet and smoking, and air pollution is now considered to be the greatest environmental threat to human health. Therefore, predicting future air quality based on past air quality data has significant research significance. The method not only can be embodied in the early warning of the air pollution condition, but also can be embodied in the better planning and decision of the development direction of the city, and can better protect the health of human beings.

The current methods for air quality prediction mainly focus on the prediction accuracy of future air quality data, and neglect the simplification of prediction errors and parameters, which can result in low prediction accuracy and long operation time.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly innovatively provides a short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction.

In order to achieve the above object, the present invention provides a short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction, comprising the steps of:

s1, decomposing data by using a CEEMDAN decomposition algorithm to obtain a plurality of IMFs;

s2, predicting the IMFs by using the deep learning models, and processing prediction errors of the deep learning models to obtain the weight of each deep learning model; the number of the deep learning models is at least two;

and S3, combining the prediction results of each deep learning model according to the weight to obtain a comprehensive prediction result, namely a multi-model combined prediction result.

Further, the deep learning model includes: LSTM, bi-LSTM, GRU and Bi-GRU.

Further, S2 comprises the steps of:

s2-1, predicting the IMFs by using different deep learning models to obtain a plurality of prediction results; at the same time, the predicted data and the values of the statistical evaluation index are obtained.

S2-2, processing the prediction error of each model by adopting any algorithm of a simple average method, an MAE (maximum energy inversion) reciprocal method and a Lagrange multiplier method to obtain the weight of each deep learning model;

when the simple averaging method is adopted, the weight of each model is equal;

when the MAE reciprocal method is employed, the weight of each model is expressed as follows:

wherein S represents the sum of reciprocals of the MAEs of the prediction results of the H models, and H is the total number of the models;

a _i representing the weight of the ith model;

MAE _i represents the mean absolute error of the ith model;

when the lagrange multiplier method is adopted, the weight of each model is obtained by solving the following formula:

wherein a is _i Represents the weight of the i-th model, i = (1, 2.., N);

n is the total number of models;

represents the derivation of a in the lagrange function L (a, λ).

And S2-3, finally combining the prediction result of each deep learning model with the weight to generate a multi-model combined prediction result.

Further, before step S1, the data is preprocessed, including the following steps:

s0-1, filling missing values by using cubic spline interpolation;

and S0-2, processing the data after cubic spline interpolation by adopting a moving average method.

The data is preprocessed, so that abnormal and missing data in the original data can be processed, and the influence of periodicity and seasonal factors in the original data can be eliminated, so that the accuracy of prediction is improved.

Further, the deep learning model is obtained by training a sparrow search algorithm, and comprises the following steps:

SA, normalizing the data decomposed by CEEMDAN, and grading the data into uniform distribution with statistical probability between 0 and 1; the normalization operation not only accelerates the convergence speed of the neural network, but also eliminates adverse effects caused by odd sample data.

SB, dividing the classified data into a training data set and a testing data set;

the SC optimizes the hyper-parameters of the deep learning model by using a sparrow search algorithm SSA, wherein the hyper-parameters comprise the neuron number, the iteration times and the learning rate of each layer of the neural network;

and SD, predicting the test data set by adopting a deep learning model, evaluating according to the evaluation index, and obtaining the optimal hyper-parameter if the hyper-parameter of the model is not over-fitted and the error of the prediction result is minimum, thereby obtaining the optimal deep learning model. And if the super parameters of the model are overfitting or the error of the prediction result is not met is minimum, skipping to the step SC.

The parameters of the deep learning model can be determined in a self-adaptive mode through a sparrow search algorithm, so that manual setting is not needed, and the simplification of the parameters is realized.

Further, the evaluation index includes: any combination of mean absolute error, root mean square error, correlation coefficient, and mean absolute percentage error.

Further, step S0 is included before step S1, and data is acquired, where the data is meteorological data.

Furthermore, the data is short-term meteorological data, and the acquisition time is 24-72 hours continuously.

In summary, due to the adoption of the technical scheme, the air quality prediction is combined with the time series decomposition, and the prediction results of the models are combined by using the weight by adopting a combined prediction method, so that the air quality prediction result with higher precision can be obtained.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic view of the overall structure of the present invention.

FIG. 2 is a schematic view of network structures of LSTM, bi-LSTM, GRU and Bi-GRU, FIG. 2 (a) is a view of LSTM network structure, FIG. 2 (b) is a view of BiLSTM network structure, FIG. 2 (c) is a view of GRU network structure, and FIG. 2 (d) is a view of Bi-GRU network structure.

FIG. 3 is a schematic view of the air quality index data of Beijing and the location of the monitoring station of Beijing according to the embodiment, and FIG. 3 (a) is the air quality index data of Beijing; FIG. 3 (b) is the location of the monitoring station in Beijing.

FIG. 4 is a schematic diagram of the multi-step deep learning model of the result of data preprocessing and processing time series of the present invention, FIG. 4 (a) is a partial result diagram of cubic spline interpolation, FIG. 4 (b) is a result diagram of a moving average line method, FIG. 4 (c) is a diagram of the multi-step deep learning model for processing time series, and FIG. 4 (d) is a partial diagram at Day 0 to Day 120 using the moving average method.

Fig. 5 shows the decomposition results and the normalization results of the CEEMDAN according to the embodiment, fig. 5 (a) shows the decomposition results of the CEEMDAN, and fig. 5 (b) shows the results of the CEEMDAN decomposition results normalized.

Fig. 6 is a schematic diagram showing the result of whether the decomposition algorithm is used on the neural model with different depths, fig. 6 (a) and 6 (b) are schematic diagrams showing the prediction results of the models before and after the decomposition algorithm is used, and fig. 6 (c) is a schematic diagram showing the error of the prediction result.

Fig. 7 is a diagram showing changes in the statistical index and the prediction result of each model.

Fig. 8 is a schematic diagram showing the prediction results of different method combinations, fig. 8 (a) and 8 (b) are graphs showing the prediction results of different method combinations, and fig. 8 (c) is a statistical graph showing the prediction errors of different method combinations.

Fig. 9 is a schematic diagram showing changes in statistical indexes of models using different combination methods, fig. 9 (a) is a schematic diagram showing changes in statistical indexes of models combined with "bg-g-bl", fig. 9 (b) is a schematic diagram showing changes in statistical indexes of models combined with "imf", and fig. 9 (c) is a schematic diagram showing changes in statistical indexes of models combined with "ceemdan".

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.

The invention provides a short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction, which comprises the following steps as shown in figure 1:

s100: and (6) data processing.

S100-1, preprocessing the data can not only process abnormal and missing data in the original data, but also eliminate the influence of periodic and seasonal factors in the original data, thereby improving the accuracy of prediction. The pre-processing includes using cubic spline interpolation to fill missing values; and processing the data after cubic spline interpolation by adopting a moving average method.

S100-2, decomposing the preprocessed data into a limited number of eigenmode functions IMF by using a decomposition algorithm. The decomposed components are used as data of subsequent parts respectively. The decomposition algorithm is CEEMDAN with adaptive noise.

S200: the model is trained and the data is predicted.

S200-1, normalizing the IMF. After normalization was used, the IMF was distributed between 0 and 1. The normalization operation not only accelerates the convergence speed of the neural network, but also eliminates adverse effects caused by odd sample data.

S200-2, the normalized IMF is divided into a training data set and a testing data set. Training data accounted for 80% of the data. Quintupling cross validation was used on the training set to obtain a reliable and stable model.

S200-3, training the deep learning model (namely the neural networks LSTM, bi-LSTM, GRU and Bi-GRU) by using a sparrow search algorithm SSA and a training data set. And (3) optimizing hyper-parameters of the neural network by using a sparrow search algorithm SSA, wherein the hyper-parameters comprise the number of neurons of each layer, iteration times and learning rate. These hyper-parameters are used to obtain the current optimal neural network. This neural network is then used to predict the test data set and evaluate it against an evaluation index. The hyper-parameters of the model for which no overfitting has occurred and the error of the prediction result is minimal are the optimal hyper-parameters.

S200-4, constructing each neural network by using the optimal hyper-parameters, and predicting the data set by using the neural networks. At the same time, the predicted data and the value of the statistical evaluation index are obtained.

S200-5, error correction, namely error correction, is carried out on the multiple models by adopting a simple average method, an MAE reciprocal method and a Lagrange multiplier method, and the weight of each deep learning model is obtained.

And S300, calculating a multi-model combined prediction result according to the weight. The combined prediction is compared and analyzed with the prediction that was not combined.

The four deep learning models are LSTM, bi-LSTM, GRU and Bi-GRU.

A Long Short Term Memory Network (LSTM) is an improved Recurrent Neural Network (RNN) which is a progressive RNN mainly used for solving Long sequence trainingThe problems of gradient disappearance and gradient explosion in the training process can be used for modeling and predicting the time sequence. The structure of the LSTM cell is shown in fig. 2 (a). LSTM is composed of a state storage unit C _t And three logic gates (input gate i) _t Output gate o _t Forgetting door f _t ) Composition of, in addition to x _t Is the input of LSTM, h _t Is the output of the LSTM. Wherein the content of the first and second substances,

which represents the multiplication of the elements of the matrix,

representing a matrix addition.

f _t ＝σ(W _f [h _t-1 ,x _t ]+b _f ) (1)

i _t ＝σ(W _i [h _t-1 ,x _t ]+b _i ) (2)

C _t ′＝tanh(W _c [h _t-1 ,x _t ]+b _c ) (3)

C _t ＝f _t *C _t-1 +i _t *C _t ′ (4)

o _t ＝σ(W _o [h _t-1 ,x _t ]+b _o ) (5)

h _t ＝o _t *tanh(C _t ) (6) in the formulas (1) to (6), W is a weight matrix corresponding to each module, and W _f Weight matrix, W, for forgetting gate _i Is a weight matrix of the input gate, W _o Is a weight matrix of output gates, W _c Is a weight matrix of the tanh module. b represents the offset error of each module, b _f To forget the bias of the door, b _i To input the offset error of the gate, b _o To output the offset error of the gate, b _c Is the bias error of the tanh module. [ h ] of _t-1 ,x _t ]Is indicated for h _t-1 ,x _t Splicing is carried out, h _t-1 Represents the output of LSTM at time t-1, x _t Indicating the input of LSTM at time t. σ is a sigmoid activation function and tanh is a hyperbolic tangent function.

In LSTM use is made of a state memory cell C _t So that the time series can be predicted more efficiently. At time t, the LSTM obtains the output h of the last unit _t-1 And a new input x _t In the meantime, the control door f is forgotten first _t Control whether to retain last cell state information C _t-1 (ii) a Input door i _t Determination of h _t-1 And x _t Which contents can be used for the calculation of the unit state, and irrelevant contents are reduced; tanh Module pass h _t-1 And x _t Generating a new state C _t '; then updating the state information, and combining the results to obtain the current state information C _t (ii) a Final output gate o _t And current state information C _t Generating an output h _t 。

The Bi-directional long-short-term memory model (Bi-LSTM) is a special LSTM, and includes a forward LSTM and a backward LSTM two-layer network. The Bi-LSTM may consider both past and future information of the data. FIG. 2 (b) shows the structure of the Bi-LSTM network. The Bi-LSTM can obtain two hidden states in reverse and connect the two states to obtain the same output. The forward LSTM and backward LSTM layers may retrieve past and future information of the input sequence. Bi-LSTM can achieve better prediction results than LSTM.

A Gated Recurrent Unit (GRU) is a Recurrent Neural Network (RNN) similar to LSTM that uses a gating mechanism to control input, memory and other information to make predictions at the current time step, which solves the problems of long-term memory and gradients in back propagation. At the same time, the biggest difference between the GRU and the LSTM is that the structure is simple, and only two gates are provided (an update gate for an update state and a reset gate for a reset state)Bit-gate) and there is no cell state C of LSTM in the GRU _t . Generally, the update door in the GRU controls whether past information can be passed back, while resetting the degree to which past information of the door control is forgotten. Fig. 2 (c) shows the structure of the GRU network.

Equations (9) to (12) represent the state of the GRU cell at time t.

r _t ＝σ(W _r [h _t-1 ,x _t ]+b _r ) (9)

z _t ＝σ(W _z [h _t-1 ,x _t ]+b _z ) (10)

Wherein r is _t Denotes a reset gate, z _t Represents the update gate, x _t Denotes the input at time t, h _t Indicating the hidden state at time t,

partial status information indicating time t, h _t Indicating status information at time t, W represents a weight, W _r Represents resetting the gate weight, W _z Represents updating the gate weight, W _h Representing the weight used for calculating the current state information in the tanh module; b represents the offset term of the input gate, b _r Bias term representing reset gate, b _z Bias term representing update gate, b _h Representing an offset term in the tanh module; σ (-) denotes sigmoid activation functions widely used for neural network work; tanh (·) represents a tanh activation function. sigmoid and tanh functions may both transform data values into a range, and σ (x) and tanh (x) may be expressed as formula (7) and formula (8).

A Bidirectional gate-controlled circulating unit (Bi-GRU) is a special type of GRU, has a structure similar to Bi-LSTM, and consists of forward and backward propagation GRU networks. FIG. 2 (d) shows the structure of a Bi-GRU network.

The decomposition algorithm is preferably a Complete set empirical mode decomposition (CEEMDAN) with adaptive noise, and may also adopt an EMD algorithm and an EEMD algorithm, wherein the EMD algorithm, namely the empirical mode decomposition algorithm, gradually decomposes a non-stationary signal into a plurality of eigen mode functions (IMFs) and a residual error (Res). EMD has clear advantages in processing non-stationary and non-linear data, and is suitable for analyzing non-linear and non-stationary signal sequences. However, EMD suffers from end effects and modal aliasing. Therefore, EEMD is provided, which adds white Gaussian noise in the original sequence, so that the distribution of signal poles is more uniform, and modal aliasing is effectively inhibited. However, the EEMD algorithm is computationally expensive, and the amplitude and number of iterations for adding white noise to EEMD needs to be set empirically. Thus, torres et al propose a fully integrated empirical mode decomposition CEEMDAN with adaptive noise. CEEMDAN not only solves the problem of modal aliasing in EMD, but also reduces the calculation cost and improves the execution efficiency by adaptively adding white noise and setting the iteration times.

The CEEMDAN algorithm includes: let y (t) be the original signal; v. of ^j (t) is a white gaussian noise signal satisfying a standard normal distribution; j is the number of times white noise is added, j =1, 2., N; e _i (. H) decomposing the function of the i-th IMF with EMD; I.C. A _i (t) is the ith component from EMD decomposition;

is the ith component obtained by decomposing CEEMDAN; ε is the amplitude of white noise. The CEEMDAN decomposition steps are as follows:

the first step is as follows: adding white Gaussian noise to the original signal y (t) to obtain a new signal

Then, EMD is used to match the new signal

Decomposing to obtain a first-order component I ₁ (t) of (d). Where res is representative of a residual; j is the number of decompositions.

Wherein

Represents a new signal obtained by adding j times of white noise;

representing the signal after j white noises are added by EMD algorithm decomposition

Representing a first-order component obtained by EMD decomposition after j times of white noise is added;

res ^j representing a residual error obtained by EMD decomposition after j times of white noise is added;

the second step is that: n number of components

The ensemble average of (A) yields the first order component of the CEEMDAN decomposition

Removing the decomposed first-order component to obtain a residual signal r ₁ (t)。

Where j is the number of times white noise is added, j =1,2.., N;

the third step: in the residual signal r ₁ (t) adding Gaussian white noise to obtain a new residual signal

Then, EMD is used to match the new residual signal

Decomposing to obtain second-order component I ₂ (t)。

Where ε is the amplitude of white noise;

E ₁ [v ^j (t)]representing a Gaussian white noise signal v added j times by EMD decomposition ^j (t)；

The fourth step: repeating the second step to obtain a second-order component of CEEMDAN

And the decomposed residual r ₂ (t)。

The fifth step: repeating the above steps until obtaining residual signal r _k (t) is a monotonic function that cannot be further decomposed, and the algorithm ends. At this time, the decomposition result of CEEMDAN

Component of (4)

Can be represented by the following equation.

Where j is the number of times white noise is added, j =1,2.., N;

at this time, the residual signal of CEEMDAN may be expressed as:

the result of the CEEMDAN decomposition can be expressed as:

the Sparrow Search Algorithm (SSA) is a swarm intelligent optimization model inspired by swarm intelligence, foraging and anti-predation behaviors of sparrows, and is proposed by Xue J and Shen B in 2020. SSA describes two types of sparrow behavior in a population of sparrows. One type of sparrow is known as the producer (producer), whose main task is to find food, so it tends to fly to places where food is abundant. Another type of sparrow is called the twitter (scoringer) who continually selects and follows the best producers for food.

The producer (scoruneger) will send an alarm signal if it finds a predator while foraging. When the alarm signal exceeds a certain threshold, the location of the producer is updated as follows:

wherein, the first and the second end of the pipe are connected with each other,

indicating the position values of the i-th sparrow (i =1,2, \ 8230;, n, n indicating the number of sparrows) at the t-th (t =1,2, \ 8230;, M, M indicating the number of parameters to be optimized) iteration of the parameter to be optimized (t =1,2, \ 8230;, P, P indicating the maximum number of iterations). Alpha and Q are random numbers, except that Q follows a normal distribution, and alpha has a value between 0 and 1. W and S represent alarm values and safety thresholds, respectively. C represents a column matrix with M rows, each row having a value of 1.

The location of the setoff (batter) is related to the producer, and the calculation method of the location is as follows:

represents the best position of the producer in the sparrow at iteration t + 1;

representing the worst position of the producer in the sparrow at the tth iteration, | · | is in absolute sign. A' = A ^T (AA ^T ) ^-1 A represents a column matrix having M rows, each row having a value of 1 or-1.

The sparrows at the edge are positioned at the periphery of all sparrows, the sparrows with uncertain identities move towards the middle when encountering danger, and the positions at the moment are updated as follows:

wherein

Representing global optimality of producers in sparrows at the t-th iterationA location; beta and K are random numbers, the difference is that beta is normally distributed, the average value is 0, the standard deviation is 1, and K controls the specific movement direction and step length of sparrows, and the value range is between-1 and 1; λ is a very small number to ensure that the denominator is not zero; f _i ,F _g And F _w The values for the ith, best and worst sparrow are indicated, respectively. The best position of the sparrow at the end of the iteration of the algorithm represents the best value of the parameter for which a search query is required.

The S200-5 comprises: using different neural networks to independently predict a plurality of decomposed IMF rows, and inputting the IMF rows into the network according to a single IMF; to obtain multiple predicted results. And then, processing the prediction errors of the neural networks to obtain the weight of each prediction model. And finally, combining the prediction result of each neural network with the weight to generate a multi-model combined prediction result.

In the formulae (26) and (27), Y _t Representing the combined prediction result of the multiple models at the time t;

representing the predicted result of the H model at the time t; a is a _i Representing the weight of the ith model and ξ the error of the multi-model combination.

Where F (-) represents the combination of the predictions for the multiple models in some way.

If the time range of the prediction result is T 'to T' + T, the combined prediction result Y of the multiple models _t And true value y _t The mean-square error (MSE) between can be expressed as:

y _t the true value representing the time t;

representing combined predicted results of multiple models; y represents the true value;

herein, the following three combined prediction methods are considered:

(1) Simple average combination method

The final multi-model combined prediction is the average of the multiple model predictions, where ξ represents the error of the multi-model combination.

Representing the predicted value of the ith model at the moment t;

xi represents the error of the multi-model combination;

(2) MAE reciprocal combination method

The weight of each model in the multi-model combined prediction can be determined by the inverse of the Mean Absolute Error (MAE). The formula for calculating the weights is as follows:

wherein S represents the sum of the reciprocals of the Mean Absolute Error (MAE) of the predictions for the H models;

a _i the weight of the ith model is represented,

MAE for mean absolute error of ith model _i As shown, equation (27) shows the final multi-model comprehensive prediction resultAnd (4) calculating a method.

(3) Combination method based on Lagrange multiplier method

The Lagrange multiplier method is a method for finding the extreme value of a multivariable function under a set of constraint conditions. In this section, the weight a of the ith model _i Is re-treated as a variable to be solved with the constraint of

As shown in equation (32).

With the introduction of errors, this can be expressed as:

the minimum value of the combined prediction result of the multiple models and the Mean Square Error (MSE) of the real value is shown in an equation (28);

is the predicted value of the ith model; y represents the true value; err (r) _i The error of the ith model is represented.

The constrained optimization problem is then transformed into an unconstrained optimization problem after introducing the lagrangian multiplier.

L(a,λ)＝E[(a ₁ err ₁ +a ₂ err ₂ +...+a _N err _N ) ² ]+λ(a ₁ +a ₂ +…+a _N -1) (35)

Wherein L (a, λ) represents a Lagrangian function; a is the quantity to be solved, and lambda is a Lagrange multiplier;

e [ ] represents desired;

introducing Karush-Kuhn-Tucker (KTT) conditions:

the combining weights can be solved by equation (36).

The specific embodiment is as follows:

firstly, data acquisition: air Quality Index (AQI) data (figure 3 (a) is AQI data of the Beijing area, and table 1 lists statistical characteristics of the AQI data) of 8695 hours from 1 month and 1 day of 2021 to 31 days of 12 months of 2021 in the Beijing area are adopted, the data are measured by 24 air monitoring stations of the Beijing area (figure 3 (b) shows the positions of monitoring stations such as Dingling, dongquan, tiantan, haizuan, huarouzhen, changping town and the like), and are published on a national city air quality real-time publishing platform (https:// air.cnemc.cn: 18007) by a Chinese environmental monitoring general station (GEMS).

The total area of Beijing is 16,410.54 square kilometers, and is located in the northern part of North China plain, leaning against Yanshan mountain. Beijing and its surrounding areas have a great deal of heavy industrial enterprises with high energy consumption and high pollution, such as industries of electric power, metallurgy, chemical industry and the like. In the case of pollution sources local to Beijing, the major sources of pollution in the past were motor vehicles and coal. In fact, with recent environmental improvement and industrial structure adjustment, the pollution of motor vehicle exhaust has become a main pollutant source, and the proportion of industrial pollution sources such as coal is gradually reduced.

Table 1 shows that there are 65 missing values in the AQI data for the beijing area, with a maximum of 500 in the data, reaching the limit in the definition of AQI. In order to reduce the influence of missing values and abnormal values in the data on subsequent prediction, the data needs to be preprocessed, so that the prediction precision is improved.

The Air Quality Index (AQI) is a quantitative description of air quality, which accounts for the degree of cleanliness or pollution of the air and the health effects of air pollution, and is calculated as follows:

AQI＝max{IAQI ₁ ,IAQI ₂ ,…,IAQI _n } (37)

wherein n is a pollutant project; IAQI _p IAQI for contaminant item P. The air quality index (IAQI) of a single contaminant represents the air quality index of the contaminant. In the formula (38), C _p A mass concentration value representing a pollutant item P; BP (Back propagation) of _H And BP _L A maximum value and a minimum value representing the mass concentration of the contaminant item P, respectively; IAQI _H And IAQI _L Respectively represent corresponding to BP _H And BP _L Air mass fraction index of (a).

Then the data is preprocessed: cubic spline interpolation is used to fill in missing values. Fig. 4 (a) is a partial result of cubic spline interpolation (2021-01-01 00.

And then processing the data after cubic spline interpolation by adopting a moving average method. The method can eliminate irregular change and other changes in the time sequence, reveal the long-term trend of the time sequence and improve the prediction precision. Time series y = (y) ₁ ,y ₂ ,…,y _T ) The calculation formula of the moving average of (1) is:

wherein G represents the number of moving average terms, and G<T, is y ₁ For an AQI value of Day 0, the last Day value is Day 8736.

Fig. 4 (b) and 4 (d) show the results using the moving average method when G = 24. Table 1 shows the statistical characteristics of the results of the raw data after the above operations.

TABLE 1 statistical description of the data

Data	Count	Mean	Std	Min		25％	50％	75％	Max	Missing
											Original	8695	63.20	53.91	7	31	51	70	500	65
Preprocessed	8737	63.06	46.59	9.82	34.29	49.17	71.17	366.08	0

In addition, the data is partitioned prior to training the model. The training set accounts for 80%, and the testing set accounts for 20%. In FIG. 4 (c), x _t The AQI value, representing time t, is the input value; y is _t+1 An AQI value representing the time t +1 (next time) is an output value; t denotes the period and T =24 denotes the prediction of the future AQI value from the continuous 24 hour AQI value. Let Day 0 be the first Day of the training data set and Day 8712 be the last Day of the testing data set.

And evaluating the prediction result of the model: to determine which model has the better prediction, the prediction of the proposed model is evaluated using several evaluation indices: mean Absolute Error (MAE), root Mean Square Error (RMSE), coefficient of correlation (R) ² ) And Mean Absolute Percent Error (MAPE), their definition is as follows:

wherein the content of the first and second substances,

is a predicted value; y = (y) ₁ ,y ₂ ,…,y _N ) Is the true value.

The Mean Absolute Error (MAE) represents the average of absolute errors between predicted values and observed values, and ranges from [0, + ∞), with smaller MAE indicating better prediction of the model

The Root Mean Square Error (RMSE) is a typical indicator of the regression model, and is used to indicate the difference between the actual data and the predicted data of the model, and the range is [0, + ∞ ], and the smaller RMSE indicates the better prediction effect of the model.

R ² Refers to the correlation coefficient, and can be used to measure the fitting degree of the regression problem. R is ² The closer to 1 indicates the better the prediction effect of the model

The Mean Absolute Percent Error (MAPE) is a relative indicator. The proposed model is perfect if MAPE =0%, meaning that the predicted effect of the model is poor if MAPE > 100%.

Meanwhile, if the prediction effect of the model is better, the parameters of the model are more excellent.

Finally, the experimental results are analyzed and compared: the experiment consists of two parts, wherein in the first part of experiment, firstly, the AQI values are predicted by using LSTM, bi-LSTM, GRU and Bi-GRU models, then, data decomposed by using a CEEMDAN decomposition algorithm are used as training data, and the models are sequentially used for prediction. Finally, their predictions are compared. In the second part of the experiment, the combining weights were obtained by the simple averaging method, the MAE reciprocal method, and the lagrange multiplier method, respectively. And secondly, combining the prediction results of the four deep learning models according to the weight. Finally, the three combined predictors are compared, respectively, and compared to the first part.

(1) Decomposition results of CEEMDAN

Fig. 5 (a) is a decomposition result of CEEMDAN, and fig. 5 (b) is a normalized result of the CEEMDAN decomposition components. Hereinafter, normalized CEEMDAN decomposition components (IMFs) are used as data.

And (3) respectively training the four deep learning models by using a training data set and an SSA optimization algorithm to obtain an optimal deep learning network model in a subsequent experimental process similar to the process without using a CEEMDAN decomposition algorithm. The prediction results of combining the CEEMDAN decomposition algorithm, the SSA optimization algorithm and the deep learning model are shown in Table 2 and FIG. 6.

TABLE 2 prediction results of GRU, bi-GRU, LSTM and Bi-LSTM on different evaluation indexes

Fig. 6 (a) shows the predicted results of the four best deep learning models with and without the CEEMDAN decomposition algorithm, and fig. 6 (b) shows a scatter plot of the resultants. The error sign in fig. 6 (c) represents the size of the predicted data relative to the original data. When the error is positive, the prediction result is larger than the original data, otherwise, the prediction result is smaller than the original data.

As can be seen from FIG. 6, the prediction results of SSA-GRU and SSA-Bi-GRU are distributed on both sides of the original data. Meanwhile, the prediction error of SSA-Bi-GRU is smaller than that of SSA-GRU. The predicted results of SSA-LSTM and SSA-Bi-LSTM are mostly smaller than the original data. The CEEMDAN decomposition algorithm also greatly reduces the predicted values. It makes most of the predicted values smaller than the original data. However, the results of SSA-Bi-GRU-CEEMDAN are all centered around the original data. It can be concluded that SSA-Bi-GRU-CEEMDAN has better predictive results. Fig. 6 (c) demonstrates this inference. Most of the error of SSA-Bi-GRU-CEEMDAN is distributed around 0.

Fig. 7 shows the evolution of the statistical evaluation index of the prediction results of the above model. Combining table 2 and fig. 7, it can be concluded that: after the data are decomposed by using the CEEMDAN decomposition algorithm, the prediction result of the model is improved, the RMSE is reduced by 11.245 percent on average, the R2 is increased by 0.866 percent on average, and the MAE is reduced by 8.165 percent on average. In particular, the SSA-Bi-GRU model achieves the best prediction results after the CEEMDAN decomposition model is used. Its RMSE and MAE were minimal, R2 was maximal in all models, while its RMSE and MAE were 4.810319 and 3.657710, respectively, with a 14.356206% and 12.123751% reduction. The R2 of the strain is increased from 0.978011 to 0.983518, and the MAEP of the strain is kept about 0.06 percent.

(2) Error correction and weight acquisition

In the experiment, the combining weight of the multi-model is obtained through the integration of a simple average method, an MAE reciprocal method and a Lagrange multiplier method.

FIG. 8 shows the predicted results of multiple model combinations. Each figure shows the results of the "bg-g-bl" method, "imf" method, and "ceemdan" method, from left to right. "bg-g-bl" in the legend represents the result of combining four deep learning models without CEEMDAN decomposition, as shown in equation (47). The "IMF" in the legend indicates that the IMF components of the four deep learning network predictions are merged first and then into the final prediction, as shown in equations (48) and (49). "ceemdan" in the legend indicates that the IMF components of each deep learning network prediction are merged first and then into the final prediction, as shown in equation (50).

W _{bg_g_bl_l} ＝[w _bg ,w _g ,w _bl ,w _l ] (44)

W _ceemdan ＝[w _bg ,w _g ,w _bl ,w _l ] (46)

Wherein Y represents the combined prediction result of the multiple models; y is _i Represents the combined prediction result of the ith (i =1,2, \8230;, n) IMF component; y is _i,j Represents the result of the i-th IMF component predicted by the model j (j = bi _ gru, gru, bi _ lstm, lstm); y is _j Representing the prediction result of the jth model; w _{bg_g_bl_l} 、W _imf And W _ceemdan Representing the weight. The combined prediction models obtained by these methods are as follows:

a. simple averaging method

The weights of the combined models represented by "bg-g-bl", "imf", and "ceemdan" are shown in formula (51), formula (52), and formula (53), respectively.

MAE reciprocal method

The weights of the combined models represented by "bg-g-bl-l", "imf", and "ceemdan" are shown in formula (54), formula (55), and formula (56), respectively.

W _{bg_g_bl_l} ＝[0.3266 0.2752 0.2153 0.1829] (54)

W _ceemdan ＝[0.3077 0.2662 0.2464 0.1797] (56)

c. Lagrange multiplier method

The weights of the combined models represented by "bg-g-bl-l", "imf", and "ceemdan" are shown in formula (57), formula (58), and formula (59), respectively.

W _{bg_g_bl_l} ＝[4.2063 -2.7239 -0.9561 0.4737] (57)

W _ceemdan ＝[0.9193 0.0673 -0.0035 0.0169] (59)

Fig. 8 shows that the combined prediction using the lagrange multiplier method is superior to the combined prediction using the inverse MAE method and the simple average method. Fig. 8 (b) clearly shows that the multiple models using the lagrangian multiplier method at the far right combine the prediction results with a close distribution on both sides of the original data. Fig. 8 (c) shows that the prediction errors after using the multi-model combination are all concentrated around 0, and the absolute value of the error is larger than 10 at a very low proportion. This ratio is also much smaller than the ratio in fig. 6 (c). This indicates that the prediction results for the multi-model combination are much higher than those for the single model. From the values in table 3, when the models are combined in the same manner, the optimal prediction result can be obtained by predicting the multi-model combination using the lagrangian multiplier method after obtaining the weights. Such a conclusion can also be drawn from fig. 9.

TABLE 3 evaluation index of combination model prediction

In table 3, the statistical indices of the predicted results of the model combination method "imf" (see the explanation of the model combination method above) are taken as an example. Weights were obtained using a simple averaging method to obtain the final predicted results for the combined model with values for the intrinsic indices RMSE, R2, MAE and MAPE of 6.256345, 0.972717, 5.499121 and 0.102130, respectively. Meanwhile, the values of the indexes RMSE, R2, MAE and MAPE of the combined model obtained by adopting the MAE inversion method are 5.880983, 0.975893, 5.294846 and 0.104462 respectively, and conversely, the values of the indexes RMSE, R2, MAE and MAPE of the combined model for obtaining the weights by using Lagrange multiplication are 4.784601, 0.984043, 3.674884 and 0.079240 respectively. For RMSE, it changed by-5.999701% and-18.642836%; r2 changed by 1.183331% and 0.835132%; MAE changed by-3.714685% and-30.595073%; MAPE altered 2.283364% and-24.144665%. Meanwhile, after the Lagrange multiplier method is used for obtaining the weight, the prediction result obtained by using other model combinations is also obviously improved.

In Table 2, the best results of decomposing the data using the CEEMDAN decomposition algorithm and then predicting using the deep learning model are SSA-Bi-GRU-CEEMDAN, RMSE and R2 are 4.810319 and 0.983518, respectively. The best results for prediction using the deep learning model directly are SSA-Bi-GRU, RMSE and R2 are 5.616658 and 0.978011, respectively. However, combining the results of table 3, using the CEEMDAN de-combination algorithm, with integration of lagrange multiplications, the best predicted RMSE was 4.764355, a reduction of 0.955529%, R2 0.984178, an increase of 0.067106%. The predicted RMSE using this method in conjunction with multiple directly used deep learning models was 3.858319, a reduction of 31.305787%, and an increase of 1.187410% for R2 of 0.989624.

In summary, the following steps: (1) GRU, bi-GRU, LSTM and Bi-LSTM are used to predict the non-resolved data and the CEEMADN resolved data. The results show that the CEEMDAN decomposition algorithm can improve the prediction effect, specifically, the average RMSE is reduced by 11.245%, the average R2 is increased by 0.866%, and the average MAE is reduced by 8.165%.

(2) A multi-model combination method based on a Lagrange multiplier method is designed, the method can obtain the weight of each deep learning model, and the weight can combine a plurality of models. The results of the multi-model combination are better than the results of the single model.

(3) Lagrange multiplication is compared with the simple average combining model and the MAE inverse combining model. Experimental results show that the results obtained using the lagrangian multiplier method are better than the other two methods.

The short-term air quality forecasting model adopts a sparrow searching algorithm and a decomposition error correction multi-model simplified combination, and predicts the air quality by using an analytic combination model. On one hand, better results can be obtained by decomposing data by using a CEEMDAN decomposition algorithm and then predicting by using a deep learning model. On the other hand, the combined model has obvious improvement on the prediction result compared with a single model. Meanwhile, the prediction result is better than that of other traditional combination methods.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A short-term air quality prediction method based on a sparrow search algorithm and decomposition error correction is characterized by comprising the following steps:

2. The method of claim 1, wherein the deep learning model comprises: LSTM, bi-LSTM, GRU and Bi-GRU.

3. The short-term air quality prediction method based on the sparrow search algorithm and decomposition error correction as claimed in claim 1, wherein S2 comprises the steps of:

s2-1, predicting the IMFs by using different deep learning models to obtain a plurality of prediction results;

and S2-3, finally, combining the prediction result of each deep learning model with the weight to generate a multi-model combined prediction result.

4. The method for predicting the short-term air quality based on the sparrow search algorithm and the decomposition error correction as claimed in claim 1, wherein the data is preprocessed before the step S1, and the method comprises the following steps:

s0-1, filling missing values by using cubic spline interpolation;

5. The short-term air quality prediction method based on the sparrow search algorithm and the decomposition error correction as claimed in claim 1, wherein the deep learning model is obtained by training the sparrow search algorithm, comprising the following steps:

SA, normalizing the data decomposed by CEEMDAN, and grading the data into uniform distribution with the statistical probability between 0 and 1;

the method comprises the steps that SC, the sparrow search algorithm SSA is used for optimizing hyper-parameters of a deep learning model, and the hyper-parameters comprise the neuron number, the iteration times and the learning rate of each layer of a neural network;

and SD, predicting the test data set by adopting a deep learning model, evaluating according to the evaluation index, and obtaining the optimal hyper-parameter if the hyper-parameter of the model is not over-fitted and the error of the prediction result is minimum, thereby obtaining the optimal deep learning model.

6. The method of claim 5, wherein the evaluation index comprises: any combination of mean absolute error, root mean square error, correlation coefficient, and mean absolute percentage error.

7. The short-term air quality prediction method based on the sparrow search algorithm and the decomposition error correction as claimed in claim 1, wherein step S1 is preceded by S0, and data is acquired, wherein the data is meteorological data.