CN113223392A - Hybrid integration model for PM2.5 hour concentration prediction - Google Patents

Hybrid integration model for PM2.5 hour concentration prediction Download PDF

Info

Publication number
CN113223392A
CN113223392A CN202110541712.4A CN202110541712A CN113223392A CN 113223392 A CN113223392 A CN 113223392A CN 202110541712 A CN202110541712 A CN 202110541712A CN 113223392 A CN113223392 A CN 113223392A
Authority
CN
China
Prior art keywords
prediction
ceemdan
model
data
concentration prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110541712.4A
Other languages
Chinese (zh)
Inventor
张莉
蔡希文
胡平
徐莉
张宇轩
苏庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Agriculture and Forestry University
Original Assignee
Xinyang Agriculture and Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Agriculture and Forestry University filed Critical Xinyang Agriculture and Forestry University
Priority to CN202110541712.4A priority Critical patent/CN113223392A/en
Publication of CN113223392A publication Critical patent/CN113223392A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B25/00Models for purposes not provided for in G09B23/00, e.g. full-sized devices for demonstration purposes

Abstract

The invention discloses a hybrid integration model for PM2.5 hour concentration prediction, which relates to the technical field of PM2.5 concentration prediction and comprises the following steps: the method comprises the steps of obtaining an input-output sequence in advance, decomposing x (T), (T) 1, T, and T) based on a CEEMDAN algorithm to obtain n finite and stable IMF components and residual error quantities, applying an FCM algorithm to an eigen function, unfolding each static IMF into a track matrix which is clustered into a group of training data subsets, determining the structure and the hyper-parameters of an LSTM network, and training on the training data subsets. The invention realizes the LSTM network mixed model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.

Description

Hybrid integration model for PM2.5 hour concentration prediction
Technical Field
The invention relates to the technical field of PM2.5 concentration prediction, in particular to a hybrid integration model for PM2.5 hour concentration prediction.
Background
With the rapid development of national economy and urbanization process in recent years, air pollution and dust haze events occur frequently, and air quality prediction increasingly becomes a focus of attention of governments and the public. Wherein the forecasted pollutant concentration data includes PM2.5, PM10, O3、NO2、SO2And CO, and the like.
PM2.5 is also called fine particles, which means particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in ambient air. It can be suspended in air for a long time, and the higher the content concentration in the air, the more serious the air pollution is. Although PM2.5 is only a component of earth's atmospheric composition in small amounts, it has a significant effect on air quality and visibility, among other things. Compared with the thicker atmospheric particulate matters, the PM2.5 has small particle size, large area, strong activity, easy attachment of toxic and harmful substances (such as heavy metals, microorganisms and the like), long retention time in the atmosphere and long conveying distance, thereby having larger influence on human health and atmospheric environmental quality.
In the prior art, a prediction model is needed after PM2.5 data are decomposed, model redundancy caused by similarity among decomposition components is not considered, and the prediction effect is poor. Whereas existing schemes use the CEEMDAN algorithm to decompose complex PM2.5 data into finite and smooth IMF components. Each stationary IMF is unfolded into a trajectory matrix. The method divides the trajectory matrix into a set of cluster samples. Then, an LSTM model is built by training the cluster samples. And calculating the distance between the clustering center and the test sample, selecting the optimal LSTM model, selecting the corresponding minimum distance, and predicting the test data of each sub-layer. And constructing the prediction of each sub-layer to obtain a final result. The CEEMDAN-FCM-LSTM hybrid model can be readily applied to PM2.5 prediction.
The invention discloses a quarterly prediction method for PM2.5 concentration, which belongs to the technical field of air quality prediction and is disclosed by the patent CN 112132336A for retrieval of China. It comprises the following steps: s100: collecting data of the area and screening the data, wherein the data comprises meteorological data, pollution data and benchmark emission list data; s200: constructing a meteorological-air quality model of the region according to the screened data; s300: acquiring an inversion quarterly emission list of the region according to the screened data and the meteorological-air quality model; s400: collecting global weather forecast field data, and constructing a prediction model according to the global weather forecast field data; s500: the quarterly predicted concentration for zone PM2.5 is obtained from the inverted quarterly emission inventory and using a predictive model simulation. The PM2.5 concentration prediction method overcomes the defect that the PM2.5 concentration prediction of a long time scale cannot be realized in the prior art, and the PM2.5 concentration prediction method of the long time scale can be realized, so that more management and control leads can be provided for refined treatment. But the method has the problems of low prediction precision, poor adaptability and certain limitation.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a hybrid integration model for PM2.5 hour concentration prediction, so as to overcome the technical problems in the prior related art.
The technical scheme of the invention is realized as follows:
a hybrid integrated model for PM2.5 hour concentration prediction, comprising the steps of:
pre-fetching an input-output sequence, expressed as: { x (1), x (2),.., x (t)) } and { y (1), y (2),.., y (t)) };
decomposing x (T), (T-1., T) based on a CEEMDAN algorithm to obtain n finite and smooth IMF components and residual quantities;
applying an FCM algorithm to the eigenfunctions and developing each stationary IMF into a trajectory matrix clustered into a set of training data subsets;
the structure and hyper-parameters of the LSTM network are determined and trained on a training data subset.
Further, the acquiring the input and output sequence further includes the following steps:
and carrying out normalization processing on the data.
Further, the method also comprises the following steps:
and after the training stage is finished, the method is used for predicting the output of the subsequent test sample.
The invention has the beneficial effects that:
the invention discloses a hybrid integration model for PM2.5 hour concentration prediction, which is used for obtaining an input and output sequence in advance, decomposing x (T), (T1., T) based on a CEEMDAN algorithm to obtain n finite and stable IMF components and residual quantities, applying an FCM algorithm to an eigen function, expanding each static IMF into a track matrix clustered into a group of training data subsets, determining the structure and the hyper-parameters of an LSTM network, and training on the training data subsets to realize an LSTM network hybrid model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a CEEMDAN-FCM-LSTM modeling framework for a hybrid integrated model for PM2.5 hour concentration prediction, according to an embodiment of the present invention;
FIG. 3(a) is a diagram illustrating the CEEMDAN decomposition results of the A weather bureau for the hybrid integrated model for PM2.5 hour concentration prediction according to an embodiment of the present invention;
FIG. 3(B) is a diagram illustrating the decomposition results of the CEEMDAN in the water plant B of a hybrid integrated model for PM2.5 hour concentration prediction according to an embodiment of the present invention;
FIG. 3(C) is a diagram illustrating the result of CEEMDAN decomposition of C brewing company for a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention;
FIG. 4(a) is a graphical illustration of forecasted results on the A weather bureau test set of a hybrid integrated model for PM2.5 hour concentration prediction, according to an embodiment of the present invention;
FIG. 4(B) is a diagram illustrating the forecast results on the B water plant test set of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention;
fig. 4(C) is a diagram illustrating the forecast results on the C s brewing company test set of a hybrid integration model for PM2.5 hour concentration prediction according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
According to an embodiment of the invention, a hybrid integration model for PM2.5 hour concentration prediction is provided.
As shown in fig. 1-2, the hybrid integration model for PM2.5 hour concentration prediction according to the embodiment of the present invention includes the following steps:
pre-fetching an input-output sequence, expressed as: { x (1), x (2),.., x (t)) } and { y (1), y (2),.., y (t)) };
decomposing x (T), (T-1., T) based on a CEEMDAN algorithm to obtain n finite and smooth IMF components and residual quantities;
applying an FCM algorithm to the eigenfunctions and developing each stationary IMF into a trajectory matrix clustered into a set of training data subsets;
the structure and hyper-parameters of the LSTM network are determined and trained on a training data subset.
Wherein, the obtaining of the input and output sequence further comprises the following steps:
and carrying out normalization processing on the data.
Wherein, still include the following step:
and after the training stage is finished, the method is used for predicting the output of the subsequent test sample.
By means of the technical scheme, an input and output sequence is obtained in advance, n finite and stable IMF components and residual quantities are obtained based on CEEMDAN algorithm decomposition x (T), (T1., T), the FCM algorithm is applied to an eigen function, each static IMF is expanded into a track matrix which is clustered into a group of training data subsets, the structure and the hyper-parameters of the LSTM network are determined, training is carried out on the training data subsets, and the LSTM network mixed model prediction strategy based on CEEMDAN data decomposition and FCM clustering is achieved. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.
In addition, as shown in fig. 3(a) -3 (C), three monitoring points of different economic development levels and natural environments of a meteorological office a, a water plant B and a brewing company C are studied to meet abundant and diverse environmental conditions, and the CEEMDAN decomposition results of three sets of data are shown in fig. 3(a) -3 (C).
In addition, three groups of data are modeled, and modeling performance and generalization of the proposed model are verified. Acquisition of time-wise PM2.5 data was performed from 1 month 1 to 12 months 31 of 2020. Wherein 366 days in 2020 are leap years, 8784 samples are collected at each monitoring point, and divided into two parts, namely 7320 samples at 1-10 months for training and 7320 samples at 11-12 months for testing. The statistical indices for the three groups of PM2.5 data are shown in table 1.
TABLE 1 statistical index of four groups of PM2.5 data
Figure BDA0003072060610000051
In addition, three different BP, RBF and LSTM neural networks are adopted as the prediction network model method. The weight of the BP neural network is determined by a genetic optimization method, and the model structure of the RBF is determined by a trial-and-error method. The learning rate of the LSTM network, the number of hidden layer neurons and the batch processing scale are obtained through a Particle Swarm Optimization (PSO) algorithm. 3 monitoring points in different areas of a certain area are selected. Different neural network models combining CEEMDAN and FCM methods and improved methods thereof are adopted to predict PM2.5 of the three regions time by time. The results of the models on different monitoring point test sets are shown in tables 2, 3 and 4.
TABLE 2 evaluation results of the models on the test set of the Flat bridge weather bureau
Figure BDA0003072060610000052
TABLE 3 evaluation results of the models on the test set of the south bay waterworks
Figure BDA0003072060610000061
TABLE 4 evaluation results of the models on the test set of the brewing company
Figure BDA0003072060610000062
In addition, the BP, RBF and LSTM neural network models are combined with the CEEMDAN decomposer to effectively improve the prediction performance, and the statistical indexes are shown in tables 2, 3 and 4. For example, in the A weather service, the root mean square error of BP, RBF, and LSTM models combined with CEEMDAN decomposition was reduced by 26.29%, 22.46%, and 59%, respectively, over the model without CEEMDAN decomposition. The CEEMDAN method can decompose a non-linear, non-stationary PM2.5 data sequence on multiple scales. The IMF subsequences and residual terms can reduce the complexity of the original PM2.5 data sequence, making the modeling of each subsequence or residual term more accurate.
Although the CEEMDAN method can be used to derive the IMF subsequences and residual terms, some of them are strongly related and therefore it is not necessary to model each subsequence. It can also be seen from the evaluation results that the FCM method for clustering and integrating these subsequences has an important influence on improving the accuracy of model prediction. The RMSE after binding FCM in CEEMDAN-BP mode, CEEMDAN-RBF mode and CEEMDAN-LSTM mode was reduced by 4.49%, 5.15% and 3.19% respectively compared with the mode without binding FCM.
In addition, the CEEMDAN-FCM-LSTM method of the present invention predicts the best performance in each monitoring test set. The method not only combines the CEEMDAN decomposition method and the FCM clustering strategy, but also the LSTM network has special gate structure and memory function. Meanwhile, the PSO algorithm can help the LSTM network to obtain the best hyper-parameter, and is liberated from a fussy task of manual trial and error selection.
As shown in fig. 4(a) to 4(c), fig. 4(a), 4(b), and 4(c) show detailed prediction results of three different monitoring points, respectively, for more intuitive comparison. It can be seen that the CEEMDAN-FCM-LSTM method has better overall performance than other models, and similar results can be seen from comparison of predicted results. Taking the forecast result of the meteorological bureau A as an example, the forecast result is well matched with the original curve, and the forecast curve of the model is closest to the real curve. Therefore, the CEEMDAN-FCM-LSTM method is suitable for describing PM2.5 data behaviors, and can improve the prediction performance.
In summary, according to the above technical solution of the present invention, an input/output sequence is obtained in advance, x (T), (T1., T) is decomposed based on a CEEMDAN algorithm to obtain n finite and steady IMF components and residual quantities, an FCM algorithm is applied to an eigen function, each stationary IMF is expanded into a trajectory matrix clustered into a set of training data subsets, a structure and a hyper-parameter of an LSTM network are determined, and training is performed on the training data subsets to implement an LSTM network hybrid model prediction strategy based on CEEMDAN data decomposition and FCM clustering. The complexity of an original PM2.5 data sequence is reduced by adopting a CEEMDAN data decomposition method, components with similar characteristics are gathered together by adopting an FCM clustering method, a PM2.5 prediction model is established by utilizing a particle swarm optimization LSTM network, the prediction precision is high, the calculation degree is low, and the adaptability is strong.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. A hybrid integrated model for PM2.5 hour concentration prediction, comprising the steps of:
pre-fetching an input-output sequence, expressed as: { x (1), x (2),.., x (t)) } and { y (1), y (2),.., y (t)) };
decomposing x (T), (T-1., T) based on a CEEMDAN algorithm to obtain n finite and smooth IMF components and residual quantities;
applying an FCM algorithm to the eigenfunctions and developing each stationary IMF into a trajectory matrix clustered into a set of training data subsets;
the structure and hyper-parameters of the LSTM network are determined and trained on a training data subset.
2. The hybrid integrated model for PM2.5 hour concentration prediction according to claim 1, wherein said capturing the input output sequence further comprises the steps of:
and carrying out normalization processing on the data.
3. The hybrid integrated model for PM2.5 hour concentration prediction according to claim 2, further comprising the steps of:
and after the training stage is finished, the method is used for predicting the output of the subsequent test sample.
CN202110541712.4A 2021-05-18 2021-05-18 Hybrid integration model for PM2.5 hour concentration prediction Pending CN113223392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110541712.4A CN113223392A (en) 2021-05-18 2021-05-18 Hybrid integration model for PM2.5 hour concentration prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110541712.4A CN113223392A (en) 2021-05-18 2021-05-18 Hybrid integration model for PM2.5 hour concentration prediction

Publications (1)

Publication Number Publication Date
CN113223392A true CN113223392A (en) 2021-08-06

Family

ID=77092757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110541712.4A Pending CN113223392A (en) 2021-05-18 2021-05-18 Hybrid integration model for PM2.5 hour concentration prediction

Country Status (1)

Country Link
CN (1) CN113223392A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214566A (en) * 2018-08-30 2019-01-15 华北水利水电大学 Short-term wind power prediction method based on shot and long term memory network
CN110210569A (en) * 2019-06-06 2019-09-06 南京工业大学 Chemical storage tank Outlier Detection Algorithm research based on FCM-LSTM
CN110348608A (en) * 2019-06-18 2019-10-18 西安交通大学 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
WO2020018392A1 (en) * 2018-07-16 2020-01-23 Cerebri AI Inc. Monitoring and controlling continuous stochastic processes based on events in time series data
CN111898820A (en) * 2020-07-27 2020-11-06 重庆市规划设计研究院 PM2.5 hour concentration combined prediction method and system based on trend clustering and integrated tree

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020018392A1 (en) * 2018-07-16 2020-01-23 Cerebri AI Inc. Monitoring and controlling continuous stochastic processes based on events in time series data
CN109214566A (en) * 2018-08-30 2019-01-15 华北水利水电大学 Short-term wind power prediction method based on shot and long term memory network
CN110210569A (en) * 2019-06-06 2019-09-06 南京工业大学 Chemical storage tank Outlier Detection Algorithm research based on FCM-LSTM
CN110348608A (en) * 2019-06-18 2019-10-18 西安交通大学 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
CN111898820A (en) * 2020-07-27 2020-11-06 重庆市规划设计研究院 PM2.5 hour concentration combined prediction method and system based on trend clustering and integrated tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨国田等: "基于变量选择的深度置信神经网络锅炉烟气NO_x排放预测", 《热力发电》 *
梁涛: "基于CEEMDAN-SE 和LSTM 神经网络的PM10浓度预测", 《环境工程》 *

Similar Documents

Publication Publication Date Title
Zeng et al. Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip
Shang et al. A novel model for hourly PM2. 5 concentration prediction based on CART and EELM
Ayturan et al. Air pollution modelling with deep learning: a review
Chai et al. A new approach of deriving indicators and comprehensive measure for ecological environmental quality assessment
CN104751242A (en) Method and device for predicting air quality index
Singh et al. Weather forecasting using machine learning techniques
Yahya et al. Designing weather forecasting model using computational intelligence tools
CN110766222B (en) PM2.5 concentration prediction method based on particle swarm parameter optimization and random forest
CN113011660A (en) Air quality prediction method, system and storage medium
Samal et al. Data driven multivariate air quality forecasting using dynamic fine tuning autoencoder layer
Zhang et al. A novel hybrid ensemble model for hourly PM2. 5 concentration forecasting
CN111897810B (en) Method for establishing combined air pollution prevention and control scheme between quantitative different-scale areas
CN117436653A (en) Prediction model construction method and prediction method for travel demands of network about vehicles
CN117194926A (en) Method and system for predicting hoisting window period of land wind power base
CN113011455A (en) Air quality prediction SVM model construction method
CN113223392A (en) Hybrid integration model for PM2.5 hour concentration prediction
CN115099499A (en) Method for predicting PM2.5 concentration based on EMD-LSTM of random forest
Xu et al. Multi-layer networks for ensemble precipitation forecasts postprocessing
CN115907091A (en) Earthquake staff death assessment method based on PSO-SVR
CN114529035A (en) CART-based wind speed forecasting method of multi-mode integrated model
Suresh et al. Analysis and prediction of air pollutant using machine learning
Chen et al. Comprehensive Accounting of Resources, Environment, and Economy Integrating Machine Learning and Establishment of Green GDP
CN113283614B (en) PM2.5concentration prediction method based on OcE
Utku et al. A long short-term memory-based hybrid model optimized using a genetic algorithm for particulate matter 2.5 prediction
CN115600764B (en) Rolling time domain energy consumption prediction method based on weight neighborhood rough set rapid reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210806