CN116796805A - PM2.5 concentration prediction method based on Gaussian process regression and deep learning - Google Patents

PM2.5 concentration prediction method based on Gaussian process regression and deep learning Download PDF

Info

Publication number
CN116796805A
CN116796805A CN202310523089.9A CN202310523089A CN116796805A CN 116796805 A CN116796805 A CN 116796805A CN 202310523089 A CN202310523089 A CN 202310523089A CN 116796805 A CN116796805 A CN 116796805A
Authority
CN
China
Prior art keywords
concentration
monitoring
prediction
point
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310523089.9A
Other languages
Chinese (zh)
Inventor
黄明智
何家安
李小勇
吴凤儿
易晓辉
陈振国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202310523089.9A priority Critical patent/CN116796805A/en
Publication of CN116796805A publication Critical patent/CN116796805A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a PM2.5 concentration prediction method based on Gaussian process regression and deep learning, which comprises the following steps: step 1: acquiring PM2.5 concentration historical data; the PM2.5 concentration history data includes an input variable that is a true value of the PM2.5 concentration and an output variable that is a predicted value of the PM2.5 concentration; step 2: dividing PM2.5 concentration historical data into a training data set and a test data set according to a preset proportion, and preprocessing the training data set and the test data set; step 3: constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting a preprocessed training data set into the PM2.5 concentration point prediction model for multiple training, obtaining ideal parameters, and configuring the PM2.5 concentration point prediction model based on the ideal parameters; step 4: inputting the preprocessed test data set into a PM2.5 concentration point prediction model with the configuration completed, and obtaining a point prediction result of an output variable; step 5: and constructing a PM2.5 concentration prediction mixed model of Gaussian process regression and deep learning, and inputting a point prediction result into the PM2.5 concentration prediction mixed model to obtain a probability distribution function and a prediction interval corresponding to the point prediction result.

Description

PM2.5 concentration prediction method based on Gaussian process regression and deep learning
Technical Field
The invention relates to the technical field of software application, in particular to a PM2.5 concentration prediction method based on Gaussian process regression and deep learning.
Background
PM2.5 is also known as fine particulate matter, fine particles. It refers to the particulate matters which can enter the lung and have the aerodynamic equivalent diameter less than or equal to 2.5 microns in the air in the environment. It can suspend in air for a long time, and is one of the key indexes of air quality monitoring and urban air pollution. With the increasing public environmental awareness, regional composite atmospheric pollution of characteristic pollutants represented by PM2.5 is attracting attention of governments and people in all aspects of society. Therefore, a reasonable and accurate PM2.5 concentration prediction model is established, effective preventive measures can be formulated, and social activities of the government can be planned, so that the harm caused by serious air pollution is avoided.
PM2.5 concentration data is taken as typical space-time sequence data, and has the characteristics of space aggregation, time periodicity, space-time correlation, uncertainty mutation and the like. Therefore, the effect of predicting the PM2.5 concentration by the conventional numerical method is poor.
In recent years, the artificial intelligence is rapidly developed and is deeply fused with various application scenes. Its strong learning and reasoning capabilities have attracted the attention of many students. The artificial neural network is an important tool for simulating nonlinear phenomenon as one of widely applied artificial intelligence technologies, and can make up for the defects of the traditional recursion method, so that the method is suitable for predicting the PM2.5 concentration. As air quality continues to improve, single point PM2.5 concentration predictions at a single time have failed to meet the requirements for providing more comprehensive information for intelligent decision making systems. Compared with single-point prediction, the uncertainty of an expected result can be successfully captured by interval prediction, and a new framework is provided for quantifying the uncertainty of time sequence prediction and improving the robustness of a model. Whereas gaussian process regression provides the prediction uncertainty explained by the estimated variance as a non-parametric probability technique for non-linear regression problems. Furthermore, the covariance function of Gaussian process regression is composed of a kernel that captures many function characteristics, and is well suited for a range of practical predictive applications, making the hybrid model developed by Gaussian process regression highly adaptive and flexible in modeling systems with complex properties.
Therefore, how to combine deep learning with Gaussian process regression to generate a new hybrid model, which is excellent in point prediction and interval prediction, is a theoretical and practical engineering problem that needs to be solved.
Disclosure of Invention
The PM2.5 concentration prediction method based on Gaussian process regression and deep learning provided by the embodiment of the invention comprises the following steps:
step 1: acquiring PM2.5 concentration historical data; the PM2.5 concentration history data comprises an input variable which is a PM2.5 concentration real value and an output variable which is a PM2.5 concentration predicted value;
step 2: dividing the PM2.5 concentration historical data into a training data set and a test data set according to a preset proportion, and preprocessing the training data set and the test data set;
step 3: constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting the preprocessed training data set into the PM2.5 concentration point prediction model for multiple training, obtaining ideal parameters, and configuring the PM2.5 concentration point prediction model based on the ideal parameters;
step 4: inputting the preprocessed test data set into the PM2.5 concentration point prediction model after configuration is completed, and obtaining a point prediction result of the output variable;
Step 5: and constructing a PM2.5 concentration prediction mixed model of Gaussian process regression and deep learning, and inputting the point prediction result into the PM2.5 concentration prediction mixed model to obtain a probability distribution function and a prediction interval corresponding to the point prediction result.
Preferably, in the step 2, preprocessing the training data set and the test data set includes:
screening and eliminating abnormal values in the training data set and the test data set;
and carrying out normalization processing on the screening and eliminating results, wherein the normalization processing formula is as follows:
y * =(y-min)/(max-min)
wherein y is any original data value in the screening and eliminating results, min is the smallest original data value in the screening and eliminating results, max is the largest original data value in the screening and eliminating results, y * To screen normalized data values corresponding to any of the original data values in the culling result.
Preferably, in the step 3, the training data set after the preprocessing is input into the PM2.5 concentration point prediction model for training for multiple times, to obtain ideal parameters, which includes:
setting the neural network parameters of the PM2.5 concentration point prediction model;
inputting the training data set into the PM2.5 concentration point prediction model for training for a plurality of times, and obtaining the accuracy of each training and the corresponding optimization parameters;
Taking the optimization parameter corresponding to the maximum accuracy as an ideal parameter;
the setting of the neural network parameters of the PM2.5 concentration point prediction model comprises the following steps:
setting a CNN layer initialization function of the PM2.5 concentration point prediction model as Kaiming;
setting a LSTM, CLSTM, GRU layer initialization function of the PM2.5 concentration point prediction model to orthoonal;
setting an optimizer of the PM2.5 concentration point prediction model to Adam;
setting the learning rate of the PM2.5 concentration point prediction model to be 1e-3;
setting a loss function of the PM2.5 concentration point prediction model to MSE;
setting a batch size of the PM2.5 concentration point prediction model to 20;
the discard rate of the PM2.5 concentration point prediction model is set to 0.2.
Preferably, the step 5 further includes:
calculating various indexes of the obtained point prediction and interval prediction, comparing the PM2.5 concentration prediction mixed model with a CNN-GPR model, an LSTM-GPR model and a GPR model to obtain a better model,
the index of the point prediction is calculated according to the following formula:
wherein y is i Is the ith observation, var is the average of the ith observations, Y i Pre-concentration of PM2.5Measuring the ith predicted value of PM2.5 predicted output by the mixed model, the CNN-GPR model, the LSTM-GPR model and the GPR model, Variance, N is the number of predicted samples, E and sigma are operators, MAE is the mean absolute error, RMSE is the mean square root error, and R is the correlation coefficient;
the index of the interval prediction is calculated according to the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the upper limit of the prediction interval of the i-th point predicted value,/->Is the lower limit of the prediction interval of the i-th point predicted value, and alpha is the credibility.
Preferably, in the step 1, the step of obtaining PM2.5 concentration history data includes:
acquiring a preset PM2.5 monitoring station distribution diagram corresponding to a target area;
preprocessing the PM2.5 monitoring site distribution diagram to obtain a trusted site distribution diagram;
determining a monitoring missing point location based on the trusted site profile;
planning a mobile monitoring route based on the monitoring missing points and a preset area map corresponding to the target area;
based on the mobile monitoring route, controlling a mobile monitoring trolley to monitor PM2.5 of the monitoring missing point location, and acquiring first historical monitoring data;
acquiring second historical monitoring data of PM2.5 monitoring of each trusted site in the trusted site distribution diagram;
and integrating the first historical monitoring data and the second historical monitoring data to obtain PM2.5 concentration historical data.
Preferably, preprocessing the PM2.5 monitored site distribution map includes:
traversing each PM2.5 monitoring station in the PM2.5 monitoring station distribution diagram in turn;
each traversing, acquiring site information of the traversed PM2.5 monitoring site based on a preset information acquisition template;
performing feature extraction on the site information based on a preset first feature extraction template to obtain a plurality of first features;
matching the first feature with a second feature in a preset indication feature library, and if the matching is met, acquiring a preset second feature extraction template and feature requirements corresponding to the second feature which are met by the matching;
performing feature extraction on the site information based on the second feature extraction template to obtain a plurality of third features;
judging whether the third characteristic meets the characteristic requirement, if not, eliminating the traversed PM2.5 monitoring site from the PM2.5 monitoring site distribution diagram;
and when the PM2.5 monitoring sites needing to be removed are removed, taking the PM2.5 monitoring site distribution map as a trusted site distribution map.
Preferably, determining the monitoring missing point based on the trusted site profile includes:
Taking each trusted site as a circle center in the trusted site distribution diagram, and taking a preset radius length as a radius to make a circular range;
dividing the trusted site distribution map into a plurality of grid areas based on a preset grid division rule;
traversing each grid area in turn;
extracting the rest areas except the circular range in the traversed grid area every time the grid area is traversed;
performing feature extraction on the residual region based on a preset third feature extraction template to obtain a plurality of fourth features;
summarizing the fourth feature to obtain a fourth feature set;
matching the fourth feature set with a preset index feature set to obtain a matching degree;
if the matching degree is greater than or equal to a preset matching degree threshold value, setting monitoring missing points in the residual area based on the monitoring missing point setting requirement;
taking the set monitoring missing point as a new trusted site, taking the new trusted site as a circle center, taking the length of the radius as the radius, and making a new circular range;
continuing to traverse the grid region;
wherein, monitoring missing point location setting requirements includes:
the first linear distance between the set monitoring missing point and the circle center of any one of the round ranges around the set monitoring missing point is larger than or equal to a preset first linear distance threshold value, the minimum linear distance between the set monitoring missing point and any one of the round ranges around the set monitoring missing point is larger than or equal to a preset second linear distance threshold value, and the third linear distance between any two of the set monitoring missing points is larger than or equal to a preset third linear distance threshold value.
Preferably, the planning the mobile monitoring route based on the monitoring missing point position and a preset area map corresponding to the target area includes:
determining a map position corresponding to the monitoring missing point in the regional map;
determining a target map position with the minimum fourth linear distance between the map position and a preset initial position of the monitoring trolley;
and planning the shortest route passing through the rest map positions by taking the target map position as a route starting point, and taking the shortest route as a mobile monitoring route.
The beneficial effects of the invention are as follows.
1. The convolutional neural network and the long-short-term memory network can be used for fully extracting information of process data, and the capability of the model for processing complex nonlinear data is improved. Data is collected using gaussian process regression to produce accurate interval predictions.
2. According to the PM2.5 concentration prediction method based on Gaussian process regression and deep learning, the accurate prediction of the PM2.5 concentration can be realized by integrating the spatial information extracted by the convolutional neural network and the time characteristics extracted by the long-short-term memory network and using a Gaussian regression process model to perform interval prediction, the PM2.5 concentration at the future time is predicted according to the change of the daily PM2.5 concentration, meanwhile, a point prediction result and a corresponding interval prediction result are given, and the output result has good reliability.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a general flow chart of a method according to the invention.
FIG. 2 is a schematic diagram of a structure of a CLSTM model in accordance with an embodiment of the invention.
FIG. 3 is a schematic diagram of an implementation of partitioning, prediction of a data set in accordance with an embodiment of the present invention.
FIG. 4 is a flow chart of a CLSTM and GPR blend model in accordance with an embodiment of the invention.
Fig. 5 is a point prediction result diagram a in an embodiment according to the present invention.
Fig. 6 is a point prediction result diagram b in an embodiment according to the present invention.
Fig. 7 is a section prediction result diagram a in an embodiment according to the present invention.
Fig. 8 is a section prediction result diagram b in an embodiment according to the present invention.
Fig. 9 is a table comparing the point prediction results of four models in accordance with an embodiment of the present invention.
Fig. 10 is a section prediction result comparison table of four models in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a PM2.5 concentration prediction method based on Gaussian process regression and deep learning, which comprises the following steps as shown in figure 1:
step 1: acquiring PM2.5 concentration historical data; the PM2.5 concentration history data comprises an input variable which is a PM2.5 concentration real value and an output variable which is a PM2.5 concentration predicted value;
step 2: dividing the PM2.5 concentration historical data into a training data set and a test data set according to a preset proportion, and preprocessing the training data set and the test data set;
step 3: constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting the preprocessed training data set into the PM2.5 concentration point prediction model for multiple training, obtaining ideal parameters, and configuring the PM2.5 concentration point prediction model based on the ideal parameters;
Step 4: inputting the preprocessed test data set into the PM2.5 concentration point prediction model after configuration is completed, and obtaining a point prediction result of the output variable; the point prediction result is a simple number, e.g., 63;
step 5: and constructing a PM2.5 concentration prediction mixed model of Gaussian process regression and deep learning, and inputting the point prediction result into the PM2.5 concentration prediction mixed model to obtain a probability distribution function and a prediction interval corresponding to the point prediction result. For example, the point prediction result is A, the prediction interval is [ A-1.96 sigma, A+1.96 sigma ], sigma is a parameter generated by a PM2.5 concentration prediction mixed model, and the probability distribution function is probability distribution of the point prediction result;
in the step 2, preprocessing the training data set and the test data set includes:
screening and eliminating abnormal values in the training data set and the test data set; a threshold value can be set during screening and rejecting, and rejecting is performed when the abnormal value is greater than the threshold value;
and carrying out normalization processing on the screening and eliminating results, wherein the normalization processing formula is as follows:
y * =(y-min)/(max-min)
wherein y is any original data value in the screening and eliminating results, min is the smallest original data value in the screening and eliminating results, max is the largest original data value in the screening and eliminating results, y * To screen normalized data values corresponding to any of the original data values in the culling result.
In the step 3, inputting the preprocessed training data set into the PM2.5 concentration point prediction model for training for a plurality of times, to obtain ideal parameters, including:
setting the neural network parameters of the PM2.5 concentration point prediction model;
inputting the training data set into the PM2.5 concentration point prediction model for training for a plurality of times, and obtaining the accuracy of each training and the corresponding optimization parameters;
taking the optimization parameter corresponding to the maximum accuracy as an ideal parameter;
the setting of the neural network parameters of the PM2.5 concentration point prediction model comprises the following steps:
setting a CNN layer initialization function of the PM2.5 concentration point prediction model as Kaiming;
setting a LSTM, CLSTM, GRU layer initialization function of the PM2.5 concentration point prediction model to orthoonal;
setting an optimizer of the PM2.5 concentration point prediction model to Adam;
setting the learning rate of the PM2.5 concentration point prediction model to be 1e-3;
setting a loss function of the PM2.5 concentration point prediction model to MSE;
setting a batch size of the PM2.5 concentration point prediction model to 20;
The discard rate of the PM2.5 concentration point prediction model is set to 0.2.
The step 5 further includes:
calculating various indexes of the obtained point prediction and interval prediction, comparing the PM2.5 concentration prediction mixed model with a CNN-GPR model, an LSTM-GPR model and a GPR model to obtain a better model,
the index of the point prediction is calculated according to the following formula:
wherein y is i Is the ith observation, var is the average of the ith observations, Y i The ith predicted value of PM2.5 predicted output is carried out for a PM2.5 concentration prediction mixed model, a CNN-GPR model, an LSTM-GPR model and a GPR model,variance, N is the number of predicted samples, E and sigma are operators, MAE is the mean absolute error, RMSE is the mean square root error, and R is the correlation coefficient;
the index of the interval prediction is calculated according to the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the upper limit of the prediction interval of the i-th point predicted value,/->Is the lower limit of the prediction interval of the i-th point predicted value, and alpha is the credibility.
The working principle and the beneficial effects of the technical scheme are as follows:
s100, selecting daily PM2.5 concentration historical data of an environment monitoring station, wherein the historical data comprise input variables and output variables, dividing the data set into a training data set and a test data set, and then carrying out data preprocessing on the training data set and the test data set;
S200, constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting the training data set into the PM2.5 concentration point prediction model for training, performing multiple tests on the model to obtain ideal parameters, and inputting the parameters into the PM2.5 concentration point prediction model to improve the model expression effect;
s300, inputting the test data set into the trained PM2.5 concentration point prediction model to obtain a point prediction result of the output variable;
s400, constructing a PM2.5 concentration prediction mixed model based on Gaussian process regression and deep learning, and inputting the point prediction result of the output variable into the trained PM2.5 concentration prediction mixed model to obtain the point prediction result of the output variable, a corresponding probability distribution function and a prediction interval corresponding to the point prediction result of the output variable.
For step S100
S100, selecting daily PM2.5 concentration historical data of an environment monitoring station, wherein the historical data comprise input variables and output variables, dividing the data set into a training data set and a test data set, and then carrying out data preprocessing on the training data set and the test data set.
The step S100 includes the steps of:
S110, according to 8 for the data set: 2, after dividing the training data set and the test data set by a proportion, constructing the training data set and the test data set once,
s120, screening and eliminating abnormal values in the data set, and then carrying out normalization processing on the data set through the following formula:
y * =(y-min)/(max-min)
wherein y is the original data value of the input variable, min is the minimum value of each group of original data in the input variable, max is the maximum value of each group of original data in the input variable, y * For each group of original data in the input variableAnd unifying the data values.
Wherein the input variables include daily PM2.5 concentration history data for the monitoring site and the output variables include daily PM2.5 concentration predictions.
For step S200
And S200, constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting the training data set into the PM2.5 concentration point prediction model for training, performing multiple tests on the model to obtain ideal parameters, and inputting the parameters into the PM2.5 concentration point prediction model to improve the model expression effect.
The step S200 includes the steps of:
s210, setting neural network parameters of the PM2.5 concentration point prediction model, wherein the model comprises an input layer, a hidden layer and an output;
S220, training the PM2.5 concentration point prediction model, optimizing parameters according to the accuracy of multiple training, and finally obtaining the optimal parameters and the optimal model.
The parameters comprise the neuron number of an input layer, the neuron number of a hidden layer, the neuron number of an output layer, the learning rate, the batch size and the iteration number.
For step S300
S300, inputting the test data set into the trained PM2.5 concentration point prediction model to obtain a point prediction result of the output variable.
For step S400
S400, constructing a PM2.5 concentration prediction mixed model based on Gaussian process regression and deep learning, and inputting the point prediction result of the output variable into the trained PM2.5 concentration prediction mixed model to obtain the point prediction result of the output variable, a corresponding probability distribution function and a prediction interval corresponding to the point prediction result of the output variable.
The step S400 includes the steps of:
s410, taking input variables in a previous training set and a test set as input variables in the previous training set and the test set, and constructing a point prediction result of the output variables to be used as output variables of a secondary training set and the test set in the next time;
S420, inputting the secondary training set and the secondary testing set into the PM2.5 concentration prediction mixed model to obtain a point prediction result of the output variable, a corresponding probability distribution function and a prediction interval corresponding to the point prediction result of the output variable;
s430, determining a prediction interval of each point prediction result under the preset confidence degree based on the mean value, the standard deviation and the preset confidence degree of the probability distribution function.
Wherein, the noun interpretation in the calculation point prediction index:
(1) Root mean square error RMSE: for calculating the square root of the ratio of the sum of squares of the predicted and observed deviations to the number of observations, a larger RMSE indicates a larger predicted error.
(2) Mean absolute percentage error MAE: for calculating the percentage of the mean absolute error of the predicted value and the observed value, the smaller MAPE indicates the more perfect the prediction model.
(3) Correlation coefficient R: for measuring the linear correlation of the predicted value with the actual value.
(4) Determining the coefficient R 2 : the deviation degree of the predicted value and the true value is measured, and the closer to 1 is between 0 and 1, the more consistent the predicted value and the true value is.
The index of the point prediction is calculated according to the following formula:
wherein y is i Is the ith observation, var is the average of the ith observations, Y i Is the i-th predicted value of the current value,is the variance and N is the number of predicted samples.
Wherein, the noun interpretation in the calculation interval prediction:
(1) Section coverage CP: the method comprises the steps of calculating the percentage of coverage observation values of a prediction interval, wherein the closer the CP is to 1, the more the observation values of the coverage of the interval are described;
(2) Interval average width MAW: the method is used for calculating the average width of the prediction interval, and the smaller the MAW is, the higher the prediction reliability of the interval is;
(3) Comprehensive index MC of interval prediction: and by integrating the indexes of the MAW and the CP, the smaller the MC value is, the better the interval prediction effect is.
The index of the interval prediction is calculated according to the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the upper limit of the prediction interval of the i-th point predicted value,/->Is the lower limit of the prediction interval of the i-th point predicted value, and alpha is the credibility.
And comparing the calculated evaluation indexes of the point prediction and the interval prediction with a CNN-GPR model, an LSTM-GPR model and a GPR model to obtain a better model, and combining with figures 5 to 8.
Specifically, referring to fig. 9, the table midpoint prediction results are shown as follows: in the point prediction, for the monitoring sites 1 and 2, the prediction precision of the four models is good, the GPR model performs worst, which indicates that the deep learning model performs excellently in the aspect of point prediction, and higher prediction precision is obtained. The four indexes of the CLSTM-GPR model are all superior to those of other models, so that the highest accuracy of the point prediction result obtained by the method is shown.
Referring to the section prediction result in the table of fig. 10, it is shown that: in the interval prediction, the comparison of the above four models presents the same trend for the predictions of the monitoring sites 1 and 2: all have higher CP, GPR model is a little better; for MWP, the prediction interval of GPR model is maximal; however, for the comprehensive index MC, the CLSTM-GPR is minimum, which shows that the interval prediction result obtained by the method of the scheme of the invention has the best comprehensive performance.
Fig. 2 is a schematic diagram of the structure of the CLSTM model. Input gate is Input layer, forget gate is forget layer, output gate is output layer, C t-1 、h t-1 、x t For input, O t ,C t ,h t For output, FIG. 2 has updated the map; FIG. 3 is a schematic diagram of an implementation of partitioning, prediction of a dataset. dataset1 is dataset1, dataset2 is dataset2, period is time, train data is training dataset, test data is test dataset; FIG. 4 is a flow chart of a CLSTM and GPR blend model. test set is a test set; train set is the training set; deep learning model training is deep learning model training; GPR model training is GPR model training; intervals parameters is an interval parameter; fig. 5 is a point prediction result diagram a. Fig. 6 is a point prediction result diagram b. Fig. 7 is a section prediction result diagram a. Fig. 8 is a section prediction result diagram b. Fig. 9 is a table comparing the point prediction results of the four models. Fig. 10 is a section prediction result comparison table of four models. RMSE (Root Mean Squared Error) root mean square error; MAE (Mean Absolute Error) mean absolute error; the correlation coefficient R is used for measuring the strength of correlation between two variables; determining the coefficient R 2 And after carrying out linear regression on the model, evaluating the coefficient fitting goodness of the regression model. CP (coverage probability) coverage, defined as the probability that an observed value falls within a prediction interval; MWP (mean width percentage) average width percent, defined as the average percent of gap width versus observation; MC is an index of MWP/CP customization. CNN in FIGS. 5, 6, 7 and 8 are CNN-GPR, LSTM is LSTM-GPR, and CLSTM is CLSTM-GPR, respectively. FIGS. 5, 6, and 9 show that the CLSTM-GPR model yields good point predictions for PM2.5 concentrations at both monitoring stations 1 and 2. Mainly referring to the point predictors of fig. 9, fig. 5 and 6 are only auxiliary. FIGS. 7, 8 and 10 show that the CLSTM-GPR model yields good interval predictions for PM2.5 concentrations at both monitoring stations 1 and 2. Mainly, see the section prediction index of fig. 10.
The convolution neural network and the long-time and short-time memory network can be used for fully extracting information of process data, and the capability of a model for processing complex nonlinear data is improved. Data is collected using gaussian process regression to produce accurate interval predictions. According to the PM2.5 concentration prediction method based on Gaussian process regression and deep learning, the accurate prediction of the PM2.5 concentration can be realized by integrating the spatial information extracted by the convolutional neural network and the time characteristics extracted by the long-short-term memory network and using a Gaussian regression process model to perform interval prediction, the PM2.5 concentration at the future time is predicted according to the change of the daily PM2.5 concentration, meanwhile, a point prediction result and a corresponding interval prediction result are given, and the output result has good reliability.
In one embodiment, in the step 1, acquiring PM2.5 concentration history data includes:
acquiring a preset PM2.5 monitoring station distribution diagram corresponding to a target area;
preprocessing the PM2.5 monitoring site distribution diagram to obtain a trusted site distribution diagram;
determining a monitoring missing point location based on the trusted site profile;
planning a mobile monitoring route based on the monitoring missing points and a preset area map corresponding to the target area;
based on the mobile monitoring route, controlling a mobile monitoring trolley to monitor PM2.5 of the monitoring missing point location, and acquiring first historical monitoring data;
acquiring second historical monitoring data of PM2.5 monitoring of each trusted site in the trusted site distribution diagram;
and integrating the first historical monitoring data and the second historical monitoring data to obtain PM2.5 concentration historical data.
The working principle and the beneficial effects of the technical scheme are as follows:
the target region is a region where prediction of PM2.5 concentration is required, for example: guangzhou city Tianhe district, etc. The preset PM2.5 monitoring station distribution diagram corresponding to the target area is a distribution diagram of PM2.5 monitoring stations in the target area, and the PM2.5 monitoring stations are PM2.5 monitors and the like. However, the distribution of the PM2.5 monitoring sites is not necessarily uniform, the monitoring missing sites need to be determined, the mobile monitoring trolley is controlled to monitor the PM2.5 on the monitoring missing sites, and a PM2.5 monitor is arranged on the mobile monitoring trolley. The preset area map corresponding to the target area is a map marked with buildings, roads, and the like. In addition, PM2.5 monitoring stations may be inaccurate in monitoring due to damage, old and useless, untimely maintenance and other reasons, so that monitoring data are unreliable, and therefore, trusted stations also need to be screened. And integrating the first historical monitoring data of the trusted site and the second historical monitoring data of the mobile monitoring trolley, and obtaining PM2.5 concentration historical data. And secondly, when PM2.5 monitoring is carried out on the trusted station and the mobile monitoring trolley, PM2.5 concentration real-time monitoring and PM2.5 concentration prediction are carried out, and then input variables and output variables are obtained. And by utilizing the cooperation of the trusted site and the mobile monitoring trolley, the uniform PM2.5 concentration monitoring is carried out in the target area, and the comprehensiveness and the accuracy of PM2.5 concentration historical data acquisition are improved. And inputting the PM2.5 concentration historical data into a neural network model for training until convergence to obtain an artificial intelligent model capable of replacing manual PM2.5 concentration prediction, acquiring PM2.5 concentration monitoring data in the latest time period, and inputting the artificial intelligent model to predict the PM2.5 concentration.
In one embodiment, preprocessing the PM2.5 monitored site profile includes:
traversing each PM2.5 monitoring station in the PM2.5 monitoring station distribution diagram in turn;
each traversing, acquiring site information of the traversed PM2.5 monitoring site based on a preset information acquisition template;
performing feature extraction on the site information based on a preset first feature extraction template to obtain a plurality of first features;
matching the first feature with a second feature in a preset indication feature library, and if the matching is met, acquiring a preset second feature extraction template and feature requirements corresponding to the second feature which are met by the matching;
performing feature extraction on the site information based on the second feature extraction template to obtain a plurality of third features;
judging whether the third characteristic meets the characteristic requirement, if not, eliminating the traversed PM2.5 monitoring site from the PM2.5 monitoring site distribution diagram;
and when the PM2.5 monitoring sites needing to be removed are removed, taking the PM2.5 monitoring site distribution map as a trusted site distribution map.
The working principle and the beneficial effects of the technical scheme are as follows:
Comparing with a preset information acquisition template, acquiring site information which can be used for verifying whether the PM2.5 monitoring site monitoring data is credible or not, and comprising the following steps: equipment model information, time length of use information, maintenance record information, and the like. And comparing a preset first characteristic extraction template, extracting a first characteristic of the site information, which can reflect at which angle to verify whether the PM2.5 monitoring site monitoring data is credible, and comprising the following steps: the site information type has equipment model number, the site information type has service duration, the site information type has maintenance record and the like. The second feature in the preset indication feature library is a feature for identifying at which angle the site information verifies whether the monitoring data is authentic, for example: the second feature is that the site information type has a maintenance record, and verification is performed from the viewpoint of the maintenance record. And matching the first feature with the second feature, if the matching is met, determining a verification angle, and acquiring a preset second feature extraction template and feature requirements corresponding to the matched second feature, wherein the second feature extraction template and the feature requirements are verification tools for verifying from the determined verification angle, for example: the verification angle is that verification is carried out from the maintenance record angle, and the second feature extraction template is that the historical maintenance frequency and the time interval between the latest maintenance time and the current time are extracted from the maintenance record information, and the feature requirement is that the historical maintenance frequency is more than or equal to 3 times/month, and the time interval between the latest maintenance time and the current time is less than or equal to 12 days. And removing the PM2.5 monitoring sites based on the verification tool, and obtaining a trusted site distribution diagram after all the PM2.5 monitoring sites are removed. The PM2.5 monitoring site monitoring data is verified from which angle is fast determined by introducing a preset indication feature library, verification resources are reduced, verification efficiency is improved, in addition, a verification tool, namely a second feature extraction template and feature requirements, are introduced to fast conduct trusted verification on the PM2.5 monitoring site, rejection is conducted when necessary, and preprocessing efficiency is improved.
In one embodiment, determining a monitoring miss location based on the trusted site profile includes:
in the trusted site distribution diagram, each trusted site is used as a circle center, and a preset radius length is used as a radius to make a circular range;
dividing the trusted site distribution map into a plurality of grid areas based on a preset grid division rule;
traversing each grid area in turn;
extracting the rest areas except the circular range in the traversed grid area every time the grid area is traversed;
performing feature extraction on the residual region based on a preset third feature extraction template to obtain a plurality of fourth features;
summarizing the fourth feature to obtain a fourth feature set;
matching the fourth feature set with a preset index feature set to obtain a matching degree;
if the matching degree is greater than or equal to a preset matching degree threshold value, setting monitoring missing points in the residual area based on the monitoring missing point setting requirement;
taking the set monitoring missing point as a new trusted site, taking the new trusted site as a circle center, taking the length of the radius as the radius, and making a new circular range;
continuing to traverse the grid area;
Wherein, monitoring missing point location setting requirements includes:
the first linear distance between the set monitoring missing point and the circle center of any one of the round ranges around the set monitoring missing point is larger than or equal to a preset first linear distance threshold value, the minimum linear distance between the set monitoring missing point and any one of the round ranges around the set monitoring missing point is larger than or equal to a preset second linear distance threshold value, and the third linear distance between any two of the set monitoring missing points is larger than or equal to a preset third linear distance threshold value.
The working principle and the beneficial effects of the technical scheme are as follows:
the preset grid division rule is to divide the trusted site distribution map into mxn (m rows and n columns) grid areas, wherein the row spacing and the column spacing are the same. The preset radius length is, for example: 300 meters. The remaining area is an area outside the monitoring coverage of the trusted site, and a preset third feature extraction template is compared to extract a fourth feature of the remaining area, which can reflect whether a monitoring missing point position needs to be set in the remaining area, and the method comprises the following steps: the total area of the remaining region and the linear distance between the region centroid and any point on the region boundary line. The preset index feature set is composed of features capable of indicating that monitoring missing points need to be set in the remaining area, for example: the total area of the remaining area is more than or equal to 0.3 square kilometer (larger area failing to monitor PM2.5 is reflected), the number of the distance between the center of mass of the area and the straight line of any point on the boundary line of the area is more than or equal to 120 meters, and the number of the distance is more than or equal to 70% (more concentrated area failing to monitor PM2.5 is reflected). And matching the fourth feature set with the index feature set to obtain a matching degree, and setting monitoring missing points in the residual area based on the setting requirement of the monitoring missing points if the matching degree is greater than or equal to a preset matching degree threshold value. And taking the set monitoring missing point as a new trusted site, taking the new trusted site as a circle center, taking the length of the radius as the radius, and making a new circular range, so as to avoid repeated setting when the monitoring missing point is set later. The monitoring missing point position setting requirement is to enable the set monitoring missing point position to form a new circular range and then to be overlapped with the original circular range as minimum as possible, so that the monitoring efficiency and the utilization rate are improved. And in addition, the requirement of setting the monitoring missing point positions is introduced, so that the rationality of setting the monitoring missing point positions in the residual area is improved.
In one embodiment, planning the mobile monitoring route based on the monitored missing point location and a preset area map corresponding to the target area includes:
determining a map position corresponding to the monitoring missing point in the regional map;
determining a target map position with the minimum fourth linear distance between the map position and a preset initial position of the monitoring trolley;
and planning the shortest route passing through the rest map positions by taking the target map position as a route starting point, and taking the shortest route as a mobile monitoring route.
The working principle and the beneficial effects of the technical scheme are as follows:
the preset initial position of the monitoring trolley is the starting position of the monitoring trolley, for example: monitoring a trolley warehouse, etc. And the starting point of the route is selected at the minimum distance of the fourth straight line, the mobile monitoring route is planned, too much time is not required to be spent when the monitoring trolley starts to go, and the monitoring scheduling efficiency is improved. In addition, the monitoring trolley can self-adaptively identify whether the monitoring missing point arrives, if not, the monitoring trolley goes to the position where the monitoring missing point can arrive nearest to carry out PM2.5 monitoring as PM2.5 concentration monitoring data of the monitoring missing point.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A PM2.5 concentration prediction method based on gaussian process regression and deep learning, comprising:
step 1: acquiring PM2.5 concentration historical data; the PM2.5 concentration history data comprises an input variable which is a PM2.5 concentration real value and an output variable which is a PM2.5 concentration predicted value;
step 2: dividing the PM2.5 concentration historical data into a training data set and a test data set according to a preset proportion, and preprocessing the training data set and the test data set;
step 3: constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-time memory network, inputting the preprocessed training data set into the PM2.5 concentration point prediction model for multiple training, obtaining ideal parameters, and configuring the PM2.5 concentration point prediction model based on the ideal parameters;
step 4: inputting the preprocessed test data set into the PM2.5 concentration point prediction model after configuration is completed, and obtaining a point prediction result of the output variable;
step 5: and constructing a PM2.5 concentration prediction mixed model of Gaussian process regression and deep learning, and inputting the point prediction result into the PM2.5 concentration prediction mixed model to obtain a probability distribution function and a prediction interval corresponding to the point prediction result.
2. The PM2.5 concentration prediction method according to claim 1, wherein in step 2, the preprocessing of the training data set and the test data set comprises:
screening and eliminating abnormal values in the training data set and the test data set;
and carrying out normalization processing on the screening and eliminating results, wherein the normalization processing formula is as follows:
y * =(y-min)/(max-min)
wherein y is any original data value in the screening and eliminating results, min is the smallest original data value in the screening and eliminating results, max is the largest original data value in the screening and eliminating results, y * To screen normalized data values corresponding to any of the original data values in the culling result.
3. The PM2.5 concentration prediction method according to claim 1, wherein in step 3, the training data set after preprocessing is input into the PM2.5 concentration point prediction model for training for a plurality of times, so as to obtain ideal parameters, which comprises:
setting the neural network parameters of the PM2.5 concentration point prediction model;
inputting the training data set into the PM2.5 concentration point prediction model for training for a plurality of times, and obtaining the accuracy of each training and the corresponding optimization parameters;
Taking the optimization parameter corresponding to the maximum accuracy as an ideal parameter;
the setting of the neural network parameters of the PM2.5 concentration point prediction model comprises the following steps:
setting a CNN layer initialization function of the PM2.5 concentration point prediction model as Kaiming;
setting a LSTM, CLSTM, GRU layer initialization function of the PM2.5 concentration point prediction model to orthoonal;
setting an optimizer of the PM2.5 concentration point prediction model to Adam;
setting the learning rate of the PM2.5 concentration point prediction model to be 1e-3;
setting a loss function of the PM2.5 concentration point prediction model to MSE;
setting a batch size of the PM2.5 concentration point prediction model to 20;
the discard rate of the PM2.5 concentration point prediction model is set to 0.2.
4. The PM2.5 concentration prediction method according to claim 1, wherein said step 5 further comprises:
calculating various indexes of the obtained point prediction and interval prediction, comparing the PM2.5 concentration prediction mixed model with a CNN-GPR model, an LSTM-GPR model and a GPR model to obtain a better model,
the index of the point prediction is calculated according to the following formula:
wherein y is i Is the ith observation, var is the average of the ith observations, Y i The ith predicted value of PM2.5 predicted output is carried out for a PM2.5 concentration prediction mixed model, a CNN-GPR model, an LSTM-GPR model and a GPR model,is variance, N is the number of predicted samples, E and sigma are operators, MAE is mean absolute error, RMSE is mean square root error, R is phaseA closing coefficient;
the index of the interval prediction is calculated according to the following formula:
wherein U is i (α) Is the upper limit of the prediction interval of the i-th point predicted value, L i (α) Is the lower limit of the prediction interval of the i-th point predicted value, and alpha is the credibility.
5. The PM2.5 concentration prediction method according to claim 1, wherein in step 1, the PM2.5 concentration history data is obtained, comprising:
acquiring a preset PM2.5 monitoring station distribution diagram corresponding to a target area;
preprocessing the PM2.5 monitoring site distribution diagram to obtain a trusted site distribution diagram;
determining a monitoring missing point location based on the trusted site profile;
planning a mobile monitoring route based on the monitoring missing points and a preset area map corresponding to the target area;
based on the mobile monitoring route, controlling a mobile monitoring trolley to monitor PM2.5 of the monitoring missing point location, and acquiring first historical monitoring data;
Acquiring second historical monitoring data of PM2.5 monitoring of each trusted site in the trusted site distribution diagram;
and integrating the first historical monitoring data and the second historical monitoring data to obtain PM2.5 concentration historical data.
6. The method for predicting PM2.5 concentration based on gaussian process regression and deep learning according to claim 5, wherein preprocessing the PM2.5 monitored site distribution map comprises:
traversing each PM2.5 monitoring station in the PM2.5 monitoring station distribution diagram in turn;
each traversing, acquiring site information of the traversed PM2.5 monitoring site based on a preset information acquisition template;
performing feature extraction on the site information based on a preset first feature extraction template to obtain a plurality of first features;
matching the first feature with a second feature in a preset indication feature library, and if the matching is met, acquiring a preset second feature extraction template and feature requirements corresponding to the second feature which are met by the matching;
performing feature extraction on the site information based on the second feature extraction template to obtain a plurality of third features;
judging whether the third characteristic meets the characteristic requirement, if not, eliminating the traversed PM2.5 monitoring site from the PM2.5 monitoring site distribution diagram;
And when the PM2.5 monitoring sites needing to be removed are removed, taking the PM2.5 monitoring site distribution map as a trusted site distribution map.
7. The method for predicting PM2.5 concentration based on gaussian process regression and deep learning according to claim 5, wherein determining the monitored missing points based on said trusted site profile comprises:
taking each trusted site as a circle center in the trusted site distribution diagram, and taking a preset radius length as a radius to make a circular range;
dividing the trusted site distribution map into a plurality of grid areas based on a preset grid division rule;
traversing each grid area in turn;
extracting the rest areas except the circular range in the traversed grid area every time the grid area is traversed;
performing feature extraction on the residual region based on a preset third feature extraction template to obtain a plurality of fourth features;
summarizing the fourth feature to obtain a fourth feature set;
matching the fourth feature set with a preset index feature set to obtain a matching degree;
if the matching degree is greater than or equal to a preset matching degree threshold value, setting monitoring missing points in the residual area based on the monitoring missing point setting requirement;
Taking the set monitoring missing point as a new trusted site, taking the new trusted site as a circle center, taking the length of the radius as the radius, and making a new circular range;
continuing to traverse the grid region;
wherein, monitoring missing point location setting requirements includes:
the first linear distance between the set monitoring missing point and the circle center of any one of the round ranges around the set monitoring missing point is larger than or equal to a preset first linear distance threshold value, the minimum linear distance between the set monitoring missing point and any one of the round ranges around the set monitoring missing point is larger than or equal to a preset second linear distance threshold value, and the third linear distance between any two of the set monitoring missing points is larger than or equal to a preset third linear distance threshold value.
8. The PM2.5 concentration prediction method based on gaussian process regression and deep learning according to claim 5, wherein planning a mobile monitoring route based on a preset area map corresponding to the monitored missing points and the target area comprises:
determining a map position corresponding to the monitoring missing point in the regional map;
determining a target map position with the minimum fourth linear distance between the map position and a preset initial position of the monitoring trolley;
And planning the shortest route passing through the rest map positions by taking the target map position as a route starting point, and taking the shortest route as a mobile monitoring route.
9. A PM2.5 concentration prediction system based on gaussian process regression and deep learning, comprising:
the acquisition module is used for acquiring PM2.5 concentration historical data; the PM2.5 concentration history data comprises an input variable which is a PM2.5 concentration real value and an output variable which is a PM2.5 concentration predicted value;
the dividing module is used for dividing the PM2.5 concentration historical data into a training data set and a test data set according to a preset proportion, and preprocessing the training data set and the test data set;
the first construction module is used for constructing a PM2.5 concentration point prediction model fusing a convolutional neural network and a long-short-term memory network, inputting the preprocessed training data set into the PM2.5 concentration point prediction model for training for a plurality of times, obtaining ideal parameters, and configuring the PM2.5 concentration point prediction model based on the ideal parameters;
the input module is used for inputting the preprocessed test data set into the PM2.5 concentration point prediction model with the configuration completed to obtain a point prediction result of the output variable;
And the second construction module is used for constructing a PM2.5 concentration prediction mixed model of Gaussian process regression and deep learning, inputting the point prediction result into the PM2.5 concentration prediction mixed model, and obtaining a probability distribution function and a prediction interval corresponding to the point prediction result.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor to implement a gaussian process regression and deep learning based PM2.5 concentration prediction method according to any one of claims 1 to 9.
CN202310523089.9A 2023-05-10 2023-05-10 PM2.5 concentration prediction method based on Gaussian process regression and deep learning Pending CN116796805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310523089.9A CN116796805A (en) 2023-05-10 2023-05-10 PM2.5 concentration prediction method based on Gaussian process regression and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310523089.9A CN116796805A (en) 2023-05-10 2023-05-10 PM2.5 concentration prediction method based on Gaussian process regression and deep learning

Publications (1)

Publication Number Publication Date
CN116796805A true CN116796805A (en) 2023-09-22

Family

ID=88042943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310523089.9A Pending CN116796805A (en) 2023-05-10 2023-05-10 PM2.5 concentration prediction method based on Gaussian process regression and deep learning

Country Status (1)

Country Link
CN (1) CN116796805A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117577227A (en) * 2024-01-16 2024-02-20 北京市生态环境监测中心 PM2.5 point location high value identification method, system, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117577227A (en) * 2024-01-16 2024-02-20 北京市生态环境监测中心 PM2.5 point location high value identification method, system, equipment and medium
CN117577227B (en) * 2024-01-16 2024-04-16 北京市生态环境监测中心 PM2.5 point location high value identification method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN111081016B (en) Urban traffic abnormity identification method based on complex network theory
WO2022089031A1 (en) Network optimization method based on big data and artificial intelligence
CN112506990B (en) Hydrological data anomaly detection method based on spatiotemporal information
CN115358332A (en) Atmospheric pollution tracing method for multi-source data
CN106934237A (en) Radar cross-section redaction measures of effectiveness creditability measurement implementation method
CN116796805A (en) PM2.5 concentration prediction method based on Gaussian process regression and deep learning
CN110555551A (en) air quality big data management method and system for smart city
CN105975797A (en) Product early-fault root cause recognition method based on fuzzy data processing
CN115578227A (en) Method for determining atmospheric particulate pollution key area based on multi-source data
KR102379472B1 (en) Multimodal data integration method considering spatiotemporal characteristics of disaster damage
CN115752708A (en) Airport single-point noise prediction method based on deep time convolution network
US20240060605A1 (en) Method, internet of things (iot) system, and storage medium for smart gas abnormal data analysis
CN111985727B (en) Method and system for predicting weather based on loop parting model
CN116401601B (en) Power failure sensitive user handling method based on logistic regression model
CN112767126A (en) Collateral grading method and device based on big data
CN109190783B (en) Urban water network leakage space aggregation detection and key influence factor identification method
CN115829209A (en) Environment-friendly intelligent warehouse environment-friendly quality analysis method and device based on carbon path
CN116151799A (en) BP neural network-based distribution line multi-working-condition fault rate rapid assessment method
CN107977727B (en) Method for predicting blocking probability of optical cable network based on social development and climate factors
Fan Data mining model for predicting the quality level and classification of construction projects
CN115511159A (en) Fast path event detection method and device based on Bayesian neural network
CN110766248A (en) Workshop human factor reliability evaluation method based on SHEL and interval intuition fuzzy evaluation
CN116187150A (en) Air pollution source inversion method based on improved ant colony algorithm
CN114880954A (en) Landslide sensitivity evaluation method based on machine learning
Tikka et al. Convolutional neural networks in estimating the spatial distribution of electric vehicles to support electricity grid planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination