CN113223392A

CN113223392A - Hybrid integration model for PM2.5 hour concentration prediction

Info

Publication number: CN113223392A
Application number: CN202110541712.4A
Authority: CN
Inventors: 张莉; 蔡希文; 胡平; 徐莉; 张宇轩; 苏庆
Original assignee: Xinyang Agriculture and Forestry University
Current assignee: Xinyang Agriculture and Forestry University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-06

Abstract

The invention discloses a hybrid integration model for PM2.5 hour concentration prediction, which relates to the technical field of PM2.5 concentration prediction and comprises the following steps: the method comprises the steps of obtaining an input-output sequence in advance, decomposing x (T), (T) 1, T, and T) based on a CEEMDAN algorithm to obtain n finite and stable IMF components and residual error quantities, applying an FCM algorithm to an eigen function, unfolding each static IMF into a track matrix which is clustered into a group of training data subsets, determining the structure and the hyper-parameters of an LSTM network, and training on the training data subsets. The invention realizes the LSTM network mixed model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.

Description

Hybrid integration model for PM2.5 hour concentration prediction

Technical Field

The invention relates to the technical field of PM2.5 concentration prediction, in particular to a hybrid integration model for PM2.5 hour concentration prediction.

Background

With the rapid development of national economy and urbanization process in recent years, air pollution and dust haze events occur frequently, and air quality prediction increasingly becomes a focus of attention of governments and the public. Wherein the forecasted pollutant concentration data includes PM2.5, PM10, O₃、NO₂、SO₂And CO, and the like.

PM2.5 is also called fine particles, which means particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in ambient air. It can be suspended in air for a long time, and the higher the content concentration in the air, the more serious the air pollution is. Although PM2.5 is only a component of earth's atmospheric composition in small amounts, it has a significant effect on air quality and visibility, among other things. Compared with the thicker atmospheric particulate matters, the PM2.5 has small particle size, large area, strong activity, easy attachment of toxic and harmful substances (such as heavy metals, microorganisms and the like), long retention time in the atmosphere and long conveying distance, thereby having larger influence on human health and atmospheric environmental quality.

In the prior art, a prediction model is needed after PM2.5 data are decomposed, model redundancy caused by similarity among decomposition components is not considered, and the prediction effect is poor. Whereas existing schemes use the CEEMDAN algorithm to decompose complex PM2.5 data into finite and smooth IMF components. Each stationary IMF is unfolded into a trajectory matrix. The method divides the trajectory matrix into a set of cluster samples. Then, an LSTM model is built by training the cluster samples. And calculating the distance between the clustering center and the test sample, selecting the optimal LSTM model, selecting the corresponding minimum distance, and predicting the test data of each sub-layer. And constructing the prediction of each sub-layer to obtain a final result. The CEEMDAN-FCM-LSTM hybrid model can be readily applied to PM2.5 prediction.

The invention discloses a quarterly prediction method for PM2.5 concentration, which belongs to the technical field of air quality prediction and is disclosed by the patent CN 112132336A for retrieval of China. It comprises the following steps: s100: collecting data of the area and screening the data, wherein the data comprises meteorological data, pollution data and benchmark emission list data; s200: constructing a meteorological-air quality model of the region according to the screened data; s300: acquiring an inversion quarterly emission list of the region according to the screened data and the meteorological-air quality model; s400: collecting global weather forecast field data, and constructing a prediction model according to the global weather forecast field data; s500: the quarterly predicted concentration for zone PM2.5 is obtained from the inverted quarterly emission inventory and using a predictive model simulation. The PM2.5 concentration prediction method overcomes the defect that the PM2.5 concentration prediction of a long time scale cannot be realized in the prior art, and the PM2.5 concentration prediction method of the long time scale can be realized, so that more management and control leads can be provided for refined treatment. But the method has the problems of low prediction precision, poor adaptability and certain limitation.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a hybrid integration model for PM2.5 hour concentration prediction, so as to overcome the technical problems in the prior related art.

The technical scheme of the invention is realized as follows:

a hybrid integrated model for PM2.5 hour concentration prediction, comprising the steps of:

pre-fetching an input-output sequence, expressed as: { x (1), x (2),.., x (t)) } and { y (1), y (2),.., y (t)) };

decomposing x (T), (T-1., T) based on a CEEMDAN algorithm to obtain n finite and smooth IMF components and residual quantities;

applying an FCM algorithm to the eigenfunctions and developing each stationary IMF into a trajectory matrix clustered into a set of training data subsets;

the structure and hyper-parameters of the LSTM network are determined and trained on a training data subset.

Further, the acquiring the input and output sequence further includes the following steps:

and carrying out normalization processing on the data.

Further, the method also comprises the following steps:

and after the training stage is finished, the method is used for predicting the output of the subsequent test sample.

The invention has the beneficial effects that:

the invention discloses a hybrid integration model for PM2.5 hour concentration prediction, which is used for obtaining an input and output sequence in advance, decomposing x (T), (T1., T) based on a CEEMDAN algorithm to obtain n finite and stable IMF components and residual quantities, applying an FCM algorithm to an eigen function, expanding each static IMF into a track matrix clustered into a group of training data subsets, determining the structure and the hyper-parameters of an LSTM network, and training on the training data subsets to realize an LSTM network hybrid model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a CEEMDAN-FCM-LSTM modeling framework for a hybrid integrated model for PM2.5 hour concentration prediction, according to an embodiment of the present invention;

FIG. 3(a) is a diagram illustrating the CEEMDAN decomposition results of the A weather bureau for the hybrid integrated model for PM2.5 hour concentration prediction according to an embodiment of the present invention;

FIG. 3(B) is a diagram illustrating the decomposition results of the CEEMDAN in the water plant B of a hybrid integrated model for PM2.5 hour concentration prediction according to an embodiment of the present invention;

FIG. 3(C) is a diagram illustrating the result of CEEMDAN decomposition of C brewing company for a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention;

FIG. 4(a) is a graphical illustration of forecasted results on the A weather bureau test set of a hybrid integrated model for PM2.5 hour concentration prediction, according to an embodiment of the present invention;

FIG. 4(B) is a diagram illustrating the forecast results on the B water plant test set of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention;

fig. 4(C) is a diagram illustrating the forecast results on the C s brewing company test set of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

According to an embodiment of the invention, a hybrid integration model for PM2.5 hour concentration prediction is provided.

As shown in fig. 1-2, the hybrid integration model for PM2.5 hour concentration prediction according to the embodiment of the present invention includes the following steps:

Wherein, the obtaining of the input and output sequence further comprises the following steps:

and carrying out normalization processing on the data.

Wherein, still include the following step:

By means of the technical scheme, an input and output sequence is obtained in advance, n finite and stable IMF components and residual quantities are obtained based on CEEMDAN algorithm decomposition x (T), (T1., T), the FCM algorithm is applied to an eigen function, each static IMF is expanded into a track matrix which is clustered into a group of training data subsets, the structure and the hyper-parameters of the LSTM network are determined, training is carried out on the training data subsets, and the LSTM network mixed model prediction strategy based on CEEMDAN data decomposition and FCM clustering is achieved. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.

In addition, as shown in fig. 3(a) -3 (C), three monitoring points of different economic development levels and natural environments of a meteorological office a, a water plant B and a brewing company C are studied to meet abundant and diverse environmental conditions, and the CEEMDAN decomposition results of three sets of data are shown in fig. 3(a) -3 (C).

In addition, three groups of data are modeled, and modeling performance and generalization of the proposed model are verified. Acquisition of time-wise PM2.5 data was performed from 1 month 1 to 12 months 31 of 2020. Wherein 366 days in 2020 are leap years, 8784 samples are collected at each monitoring point, and divided into two parts, namely 7320 samples at 1-10 months for training and 7320 samples at 11-12 months for testing. The statistical indices for the three groups of PM2.5 data are shown in table 1.

TABLE 1 statistical index of four groups of PM2.5 data

In addition, three different BP, RBF and LSTM neural networks are adopted as the prediction network model method. The weight of the BP neural network is determined by a genetic optimization method, and the model structure of the RBF is determined by a trial-and-error method. The learning rate of the LSTM network, the number of hidden layer neurons and the batch processing scale are obtained through a Particle Swarm Optimization (PSO) algorithm. 3 monitoring points in different areas of a certain area are selected. Different neural network models combining CEEMDAN and FCM methods and improved methods thereof are adopted to predict PM2.5 of the three regions time by time. The results of the models on different monitoring point test sets are shown in tables 2, 3 and 4.

TABLE 2 evaluation results of the models on the test set of the Flat bridge weather bureau

TABLE 3 evaluation results of the models on the test set of the south bay waterworks

TABLE 4 evaluation results of the models on the test set of the brewing company

In addition, the BP, RBF and LSTM neural network models are combined with the CEEMDAN decomposer to effectively improve the prediction performance, and the statistical indexes are shown in tables 2, 3 and 4. For example, in the A weather service, the root mean square error of BP, RBF, and LSTM models combined with CEEMDAN decomposition was reduced by 26.29%, 22.46%, and 59%, respectively, over the model without CEEMDAN decomposition. The CEEMDAN method can decompose a non-linear, non-stationary PM2.5 data sequence on multiple scales. The IMF subsequences and residual terms can reduce the complexity of the original PM2.5 data sequence, making the modeling of each subsequence or residual term more accurate.

Although the CEEMDAN method can be used to derive the IMF subsequences and residual terms, some of them are strongly related and therefore it is not necessary to model each subsequence. It can also be seen from the evaluation results that the FCM method for clustering and integrating these subsequences has an important influence on improving the accuracy of model prediction. The RMSE after binding FCM in CEEMDAN-BP mode, CEEMDAN-RBF mode and CEEMDAN-LSTM mode was reduced by 4.49%, 5.15% and 3.19% respectively compared with the mode without binding FCM.

In addition, the CEEMDAN-FCM-LSTM method of the present invention predicts the best performance in each monitoring test set. The method not only combines the CEEMDAN decomposition method and the FCM clustering strategy, but also the LSTM network has special gate structure and memory function. Meanwhile, the PSO algorithm can help the LSTM network to obtain the best hyper-parameter, and is liberated from a fussy task of manual trial and error selection.

As shown in fig. 4(a) to 4(c), fig. 4(a), 4(b), and 4(c) show detailed prediction results of three different monitoring points, respectively, for more intuitive comparison. It can be seen that the CEEMDAN-FCM-LSTM method has better overall performance than other models, and similar results can be seen from comparison of predicted results. Taking the forecast result of the meteorological bureau A as an example, the forecast result is well matched with the original curve, and the forecast curve of the model is closest to the real curve. Therefore, the CEEMDAN-FCM-LSTM method is suitable for describing PM2.5 data behaviors, and can improve the prediction performance.

In summary, according to the above technical solution of the present invention, an input/output sequence is obtained in advance, x (T), (T1., T) is decomposed based on a CEEMDAN algorithm to obtain n finite and steady IMF components and residual quantities, an FCM algorithm is applied to an eigen function, each stationary IMF is expanded into a trajectory matrix clustered into a set of training data subsets, a structure and a hyper-parameter of an LSTM network are determined, and training is performed on the training data subsets to implement an LSTM network hybrid model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A hybrid integrated model for PM2.5 hour concentration prediction, comprising the steps of:

2. The hybrid integrated model for PM2.5 hour concentration prediction according to claim 1, wherein said capturing the input output sequence further comprises the steps of:

and carrying out normalization processing on the data.

3. The hybrid integrated model for PM2.5 hour concentration prediction according to claim 2, further comprising the steps of: