CN117332815A - Prediction method and prediction early warning system for atmospheric pollution of industrial park - Google Patents
Prediction method and prediction early warning system for atmospheric pollution of industrial park Download PDFInfo
- Publication number
- CN117332815A CN117332815A CN202311236270.8A CN202311236270A CN117332815A CN 117332815 A CN117332815 A CN 117332815A CN 202311236270 A CN202311236270 A CN 202311236270A CN 117332815 A CN117332815 A CN 117332815A
- Authority
- CN
- China
- Prior art keywords
- data
- prediction
- atmospheric pollution
- industrial park
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012544 monitoring process Methods 0.000 claims abstract description 62
- 239000003344 environmental pollutant Substances 0.000 claims abstract description 23
- 231100000719 pollutant Toxicity 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000010606 normalization Methods 0.000 claims abstract description 14
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 claims description 17
- 239000000356 contaminant Substances 0.000 claims description 14
- 230000007613 environmental effect Effects 0.000 claims description 13
- 238000010219 correlation analysis Methods 0.000 claims description 8
- 238000003915 air pollution Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 4
- 238000013079 data visualisation Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims 2
- 230000002596 correlated effect Effects 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 3
- 230000004044 response Effects 0.000 abstract description 3
- 239000013618 particulate matter Substances 0.000 description 31
- 238000012360 testing method Methods 0.000 description 8
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 4
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 241000282994 Cervidae Species 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000003373 anti-fouling effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Tourism & Hospitality (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Development Economics (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Economics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a prediction method and a prediction early warning system for atmospheric pollution of an industrial park, wherein the method comprises the following steps: selecting the monitoring data of the environment monitoring points in the research area from year to year as a data source, performing data set missing value interpolation, meteorological factor screening and data normalization reconstruction on the data, and inputting the data into an SA-LSTM model and an XGBoost model for training to obtain a combined model; and constructing an industrial park atmospheric pollution data prediction and early warning system, and predicting and releasing pollutant monitoring data information by using modules in the system. The prediction method and the prediction early warning system for the atmospheric pollution of the industrial park can accurately and timely monitor and predict the atmospheric pollution index in the industrial park, provide scientific basis for intelligent management and control of the atmospheric pollution of the industrial park, and can greatly improve the risk monitoring, rapid early warning and emergency response capability of the industrial park.
Description
Technical Field
The invention relates to the technical field of monitoring of atmospheric pollutants in an industrial park, in particular to a prediction method and a prediction early warning system for atmospheric pollution in the industrial park.
Background
Atmospheric pollutants discharged in the production process of the industrial park is an important environmental pollution source, the industrial park is a main battlefield for industrial atmospheric pollution control, the industrial park atmospheric key pollutants are predicted and analyzed, regional atmospheric environment quality regulation and control technical research is carried out, the risk monitoring, rapid early warning and emergency response capability of the industrial park are greatly improved, and the regional ecological environment safety and the life and property safety of people are effectively ensured. To control the emission problem of atmospheric pollution, accurate prediction is a basis and premise. Reasonable prediction method and technology are of great significance in formulating and perfecting atmospheric environment antifouling treatment policy.
In the past, the management and control of the park is focused on real-time monitoring of the atmospheric pollutants of the park, but the research on scientific prediction of the characteristic pollutants of the industrial park is less, the method is based on the industrial park in the Shanzhou market, analyzes the history data of the pollutants of the industrial park, utilizes the related advanced model to scientifically predict the atmospheric pollutants of the industrial park, and provides scientific basis for intelligent management and control of the atmospheric pollutants of the industrial park.
Disclosure of Invention
The invention provides a prediction method and a prediction and early warning system for atmospheric pollution of an industrial park.
The specific technical scheme is as follows:
the industrial park atmospheric pollution prediction method comprises the following steps:
(1) Environmental monitoring points in the industrial park are selected, and monitoring data of the environmental monitoring points in the past year are called as a training set;
(2) Sequentially performing missing value interpolation processing and data normalization processing on the training set in the step (1) to obtain a processed data set;
(3) Inputting the data set processed in the step (2) into an SA-LSTM model and an XGBoost model respectively for model training, and obtaining a combined prediction model of the SA-LSTM model and the XGBoost by using a reciprocal variance method;
(4) And retrieving monitoring data of the environment monitoring point at the previous moment, and respectively inputting the monitoring data into the combined prediction model to obtain the predicted atmospheric pollution condition at the next moment.
AQI in this context represents the air quality index and the environmental monitoring point is a government-set environmental monitoring point.
Further, in step (1), the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
Further, in the step (2), the method of the missing value interpolation processing is as follows:
(2-1) finding missing contaminant concentration data and corresponding missing moments from the training set;
and (2-2) using a self-encoder model, retrieving the contaminant concentration data corresponding to the previous time and the next time of the missing time, and calculating the average value of the previous time and the next time as the contaminant concentration data of the missing time.
Further, in the step (1), the retrieved monitoring data further includes weather data; the meteorological data comprise wind speed, wind direction, air pressure, air temperature and humidity;
in the step (2), after the missing value interpolation processing, the meteorological data is screened first, and then the data normalization processing is carried out;
the method for screening the meteorological data comprises the following steps:
(2-A) mixing the meteorological data in the monitoring data obtained in the step (1) with PM 2.5 Carrying out correlation analysis on the concentration data and the air quality index to obtain a correlation coefficient;
(2-B) determining each of the weather data and the PM according to the correlation determination criterion 2.5 And (3) obtaining meteorological data which can be input as the combined model in the step (3) by correlation of the concentration data and the air quality index.
Further, the correlation judgment standard is that the weather factor and PM are completed by using a correlation analysis tool 2.5 Is a correlation analysis of the correlation coefficient representing whether the correlation between two variables is significant if the correlation coefficient is<0.4, low correlation; if 0.4 is less than or equal to the absolute correlation coefficient<0.7, the linear correlation is significant; if the correlation coefficient is more than or equal to 0.7 and less than or equal to 1, the correlation is high;
TABLE 1PM 2.5 Correlation of AQI and meteorological factors
Correlation coefficient table shows, and PM 2.5 The parameter with the maximum correlation is AQI, which reaches 0.931, and the correlation is ranked as AQI>Air temperature>Humidity of the water>Wind direction>Wind speed>Barometric pressure, overall, deer city light industrial park PM 2.5 AQI and meteorological factors are all significant, but PM 2.5 AQI has low correlation with wind speed, wind direction and air pressure, significant correlation with air temperature and humidity, PM 2.5 And the AQI and meteorological factor correlation diagram is shown in the attached drawing; displaying four meteorological factors of wind speed, air pressure, temperature and humidity according to a correlation coefficient table to serve as input data of a prediction template;
further, in the step (2), the data normalization processing method includes:
and carrying out normalization processing on the processed data by using a mapmin max function, wherein a normalization formula is as follows:
in the formula (1), x' represents the value of the single data of each sample characteristic, min is the minimum value of the sample characteristic data, and max is the maximum value of the sample characteristic data.
Further, the SA-LSTM model in the combined model comprises the following four components:
i) Input layer: the method comprises the steps of taking time as a sequence and inputting monitoring data of environmental monitoring points;
ii) LSTM layer: after the LSTM layer, the memory and the forget are coded, an output vector of the hidden layer is obtained;
iii) Self-care layer: the self-attention mechanism generates a weight vector, weights the hidden states of all time steps, and focuses attention on a more important part in the whole hidden state information sequence;
iv) output layer: the sequence-level feature vector is finally used for time sequence data analysis and prediction;
further, training of the XGBoost model includes the steps of:
dividing the preprocessed data set obtained in the step (2) into a training set and a testing set, extracting characteristic values of the training set, inputting the characteristic values into the XGBoost classification model, training to obtain the XGBoost classification model, testing by using the testing set, and obtaining a final model after the testing;
further, in the step (3), LSTM initialization parameters in the SA-LSTM model are: the weight gradient learning rate is set to 0.001, tensor 10 is input, dimension 64 is input, and the iteration number is 100; the random seed is 42; setting a maximum tree depth of 6 by the XGBoost model, wherein the learning rate is 0.05, and fitting by using 100 trees; the random seed is 42;
further, selecting a root mean square error, an average absolute error and an average absolute percentage error as evaluation indexes of the model;
the root mean square error formula is as follows:
equation (2), wherein the true value is a predicted value, and the average of the values is taken by subtracting the values and summing the squares, and the closer the predicted value and the true value are to the RMSE, the smaller the value is; the larger the error, the larger the value of RMSE; mse represents the root mean square value, m represents the number of observations, y i Representing the value of the observation,representing a true value;
the mean absolute error formula is as follows:
equation (3), wherein the real value is represented as a predicted value, and the average is obtained by subtracting the sum of squares, so that the value of MAE is smaller as the predicted value and the real value are closer; the larger the error, the larger the MAE value; m represents the number of observations, y i Representing the value of the observation,representing a true value;
the mean absolute percentage error formula is as follows:
equation (4), which is a true value and represents a predicted value, the range is [0, + ], MAPE represents a perfect model when 0%, and an inferior model when MAPE is greater than 100%; n represents the number of observations, y i Representing the value of the observation,representing a true value;
further, combining the SA-LSTM after training with the XGBoost model, wherein the combination process is to use a reciprocal variance method for the prediction results of the SA-LSTM model and the XGBoost model to obtain the prediction result of the final combination model, and the specific method of the reciprocal variance method is as follows:
b = predictive model SA-LSTM predicted value;
a = a value of prediction model XGBoost prediction;
e 1 =variance of prediction model SA-LSTM prediction value;
e 2 variance of =prediction model XGBoost prediction value;
w 1 =(1/e 1 )/(1/e 1 +1/e 2 ) The weight of the predicted value of the SA-LSTM model;
w 2 =(1/e 2 )/(1/e 1 +1/e 2 ) The weight of the XGBoost model predicted value;
final predicted value x=w 1 ×a+w 2 ×b。
The invention also provides an industrial park atmospheric pollution data prediction and early warning system, which comprises:
the data acquisition module is used for receiving monitoring data information of the environmental monitoring points;
the data prediction module predicts the pollutant monitoring data information acquired by the data acquisition module by using the prediction method of the atmospheric pollution of the industrial park to obtain a prediction result of the atmospheric pollution condition;
the data early warning module is used for carrying out early warning on the predicted air pollution condition at the next moment according to a preset air pollution condition threshold value and sending early warning information to the mobile terminal;
and the data visualization display module displays the prediction result of the data prediction module.
Further, the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
Further, the operation steps of the industrial park atmospheric pollution data prediction and early warning system comprise:
(1) PM (particulate matter) in three hours of national control station of park to be predicted is obtained by selecting park to be predicted and pollutant types through system 2.5 Monitoring data in real time on concentration and air quality index;
(2) Inputting the real-time monitoring data into a combined model of SA-LSTM and XGBoost for prediction to obtain a prediction result;
(3) When the air quality index data in the prediction result is more than 180, the prediction is finished, and the platform issues early warning; returning to the step (1) to repeat the steps (2) - (3) when the air quality index data is not abnormal;
compared with the prior art, the invention has the following beneficial effects:
(1) The invention provides the prediction method and the prediction early warning system for the atmospheric pollution of the industrial park, which can realize the real-time monitoring and prediction of the atmospheric pollutants, and can issue early warning according to the prediction result to timely prevent and treat the park pollution;
(2) The risk monitoring, rapid early warning and emergency response capability of the industrial park can be greatly improved;
(3) The data of the pollutants can be displayed more conveniently and effectively through the combination of the platform and the model.
Drawings
FIG. 1 is a flow chart of a method for predicting atmospheric pollution in an industrial park according to example 1.
FIG. 2 is PM in example 1 2.5 And AQI and meteorological factor correlation graphs, wherein WindSpeed represents wind speed;
WindDirection represents wind direction; air pressure represents air pressure; temperature represents Temperature; humidi represents Humidity.
FIG. 3 is a diagram of the SA-LSTM structure of example 1, wherein LSTM Layer represents the LSTM Layer; the Attention Layer represents a self-Attention Layer; fully Connectrd Layer represents a fully attached layer; x is X 1 ~X t Representing input data; y is Y t Representing the output data.
FIG. 4 is a graph showing the comparison of predicted values and actual values of LSTM and SA-LSTM network models in example 1, wherein the upper column LSTM predicted results is the result of comparing the predicted values and actual values of the LSTM network models, the predicted_LSTM is the predicted values of the LSTM network models, and the y_true is the actual value; the following SA-LSTM predicted results is the comparison of the SA-LSTM model prediction result with the true value, the prediction_sa-LSTM is the SA-LSTM model prediction result, and the y_true is the true value.
Fig. 5 is a graph of the predicted value and the actual value of the XGBoost model in example 1, wherein XGBoost predicted results is the result of comparing the predicted value and the actual value of the XGBoost model, the predicted_xgboost is the predicted value of the XGBoost model, and the y_true is the actual value.
FIG. 6 is a graph showing the comparison of the predicted value and the actual value of the combined model in example 1, wherein Mix-Model predicted results is the comparison of the predicted value and the actual value of the combined model, and the predicted_mix-model is the predicted value and y_true is the actual value of the combined model.
Fig. 7 is a schematic exploded view of a platform function module in embodiment 1.
Detailed Description
The invention will be further described with reference to the following examples, which are given by way of illustration only, but the scope of the invention is not limited thereto.
Example 1
The industrial park atmospheric pollution prediction method comprises the following steps:
(1) Environmental monitoring points in the industrial park are selected, and monitoring data of the environmental monitoring points in the past year are called as a training set;
(2) Sequentially performing missing value interpolation processing and data normalization processing on the training set in the step (1) to obtain a processed data set;
(3) Inputting the data set processed in the step (2) into an SA-LSTM model and an XGBoost model respectively for model training, and obtaining a combined prediction model of the SA-LSTM model and the XGBoost by using a reciprocal variance method;
(4) And retrieving monitoring data of the environment monitoring point at the previous moment, and respectively inputting the monitoring data into the combined prediction model to obtain the predicted atmospheric pollution condition at the next moment.
AQI in this context stands for air quality index.
Further, in step (1), the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
Further, in the step (2), the method of the missing value interpolation processing is as follows:
(2-1) finding missing contaminant concentration data and corresponding missing moments from the training set;
and (2-2) using a self-encoder model, retrieving the contaminant concentration data corresponding to the previous time and the next time of the missing time, and calculating the average value of the previous time and the next time as the contaminant concentration data of the missing time.
Further, in the step (1), the retrieved monitoring data further includes weather data; the meteorological data comprise wind speed, wind direction, air pressure, air temperature and humidity;
in the step (2), after the missing value interpolation processing, the meteorological data is screened first, and then the data normalization processing is carried out;
the method for screening the meteorological data comprises the following steps:
(2-A) mixing the meteorological data in the monitoring data obtained in the step (1) with PM 2.5 Carrying out correlation analysis on the concentration data and the air quality index to obtain a correlation coefficient;
(2-B) determining each of the weather data and the PM according to the correlation determination criterion 2.5 And (3) obtaining meteorological data which can be input as the combined model in the step (3) by correlation of the concentration data and the air quality index.
Further, the correlation judgment standard is that the weather factor and PM are completed by using a correlation analysis tool 2.5 Is a correlation analysis of the correlation coefficient representing whether the correlation between two variables is significant if the correlation coefficient is<0.4, low correlation; if 0.4 is less than or equal to the absolute correlation coefficient<0.7, the linear correlation is significant; if the correlation coefficient is more than or equal to 0.7 and less than or equal to 1, the correlation is high;
TABLE 2PM 2.5 Correlation of AQI and meteorological factors
Correlation coefficient table shows, and PM 2.5 The parameter with the maximum correlation is AQI, which reaches 0.931, and the correlation is ranked as AQI>Air temperature>Humidity of the water>Wind direction>Wind speed>Barometric pressure, overall, deer city light industrial park PM 2.5 AQI and meteorological factors are all significant, but PM 2.5 AQI has low correlation with wind speed, wind direction and air pressure, significant correlation with air temperature and humidity, PM 2.5 And the correlation diagram of the AQI and the meteorological factors is shown in figure 2; displaying four meteorological factors of wind speed, air pressure, temperature and humidity according to a correlation coefficient table to serve as input data of a prediction template;
further, in the step (2), the data normalization processing method includes:
and carrying out normalization processing on the processed data by using a mapmin max function, wherein a normalization formula is as follows:
in the formula (1), x' represents the value of the single data of each sample characteristic, min is the minimum value of the sample characteristic data, and max is the maximum value of the sample characteristic data.
Further, the SA-LSTM model in the combined model comprises the following four components, and the specific reference is made to fig. 4:
i) Input layer: the method comprises the steps of taking time as a sequence and inputting monitoring data of environmental monitoring points;
ii) LSTM layer: after the LSTM layer, the memory and the forget are coded, an output vector of the hidden layer is obtained;
iii) Self-care layer: the self-attention mechanism generates a weight vector, weights the hidden states of all time steps, and focuses attention on a more important part in the whole hidden state information sequence;
iv) output layer: the sequence-level feature vector is finally used for time sequence data analysis and prediction;
further, training of the XGBoost model includes the steps of:
dividing the preprocessed data set obtained in the step (2) into a training set and a testing set, extracting characteristic values of the training set, inputting the characteristic values into the XGBoost classification model, training to obtain the XGBoost classification model, testing by using the testing set, and obtaining a final model after the testing;
further, in the step (3), LSTM initialization parameters in the SA-LSTM model are: the weight gradient learning rate is set to 0.001, tensor 10 is input, dimension 64 is input, and the iteration number is 100; the random seed is 42; setting a maximum tree depth of 6 by the XGBoost model, wherein the learning rate is 0.05, and fitting by using 100 trees; the random seed is 42;
further, selecting a root mean square error, an average absolute error and an average absolute percentage error as evaluation indexes of the model;
the root mean square error formula is as follows:
equation (2), wherein the true value is a predicted value, and the average of the values is taken by subtracting the values and summing the squares, and the closer the predicted value and the true value are to the RMSE, the smaller the value is; the larger the error, the larger the value of RMSE; mse represents the root mean square value, m represents the number of observations, y i Representing the value of the observation,representing a true value;
the mean absolute error formula is as follows:
equation (3), wherein the real value is represented as a predicted value, and the average is obtained by subtracting the sum of squares, so that the value of MAE is smaller as the predicted value and the real value are closer; the larger the error, the larger the MAE value; m represents the number of observations, y i Representing the value of the observation,representing a true value;
the mean absolute percentage error formula is as follows:
equation (4), which is a true value and represents a predicted value, the range is [0, + ], MAPE represents a perfect model when 0%, and an inferior model when MAPE is greater than 100%; n represents the number of observations, y i Representing the value of the observation,representing a true value;
further, combining the SA-LSTM after training with the XGBoost model, wherein the combination process is to use a reciprocal variance method for the prediction results of the SA-LSTM model and the XGBoost model to obtain the prediction result of the final combination model, and the specific method of the reciprocal variance method is as follows:
b = predictive model SA-LSTM predicted value
a = value of prediction model XGBoost prediction
e 1 Variance of prediction value of SA-LSTM of prediction model
e 2 Variance of =prediction model XGBoost prediction value
w 1 =(1/e 1 )/(1/e 1 +1/e 2 ) Weights for SA-LSTM model predictions
w 2 =(1/e 2 )/(1/e 1 +1/e 2 ) Weights for XGBoost model predictions
Final predicted value x=w 1 ×a+w 2 ×b。
The predicted results using LSTM model, SA-LSTM model, XGBoost model and combined model are compared with the actual values, as shown in figures 4, 5 and 6.
The invention also provides an industrial park atmospheric pollution data prediction and early warning system, which comprises:
the data acquisition module is used for receiving monitoring data information of the environmental monitoring points;
the data prediction module predicts the pollutant monitoring data information acquired by the data acquisition module by using the prediction method of the atmospheric pollution of the industrial park to obtain a prediction result of the atmospheric pollution condition;
the data early warning module is used for carrying out early warning on the predicted air pollution condition at the next moment according to a preset air pollution condition threshold value and sending early warning information to the mobile terminal;
and the data visualization display module displays the prediction result of the data prediction module.
Further, the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
Further, the operation steps of the industrial park atmospheric pollution data prediction and early warning system comprise:
(1) PM (particulate matter) in three hours of national control station of park to be predicted is obtained by selecting park to be predicted and pollutant types through system 2.5 Monitoring data in real time on concentration and air quality index;
(2) Inputting the real-time monitoring data into a combined model of SA-LSTM and XGBoost for prediction to obtain a prediction result;
(3) When the air quality index data in the prediction result is more than 180, the prediction is finished, and the platform issues early warning; and (3) returning to the step (1) to repeat the steps (2) - (3) when the air quality index data is not abnormal.
Claims (9)
1. The industrial park atmospheric pollution prediction method is characterized by comprising the following steps of:
(1) Environmental monitoring points in the industrial park are selected, and monitoring data of the environmental monitoring points in the past year are called as a training set;
(2) Sequentially performing missing value interpolation processing and data normalization processing on the training set in the step (1) to obtain a processed data set;
(3) Inputting the data set processed in the step (2) into an SA-LSTM model and an XGBoost model respectively for model training, and obtaining a combined prediction model of the SA-LSTM model and the XGBoost by using a reciprocal variance method;
(4) And retrieving monitoring data of the environment monitoring point at the previous moment, and respectively inputting the monitoring data into the combined prediction model to obtain the predicted atmospheric pollution condition at the next moment.
2. The method of claim 1, wherein in step (1), the monitoring data comprises concentration data of contaminants; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
3. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in the step (2), the method for performing the missing value interpolation process comprises:
(2-1) finding missing contaminant concentration data and corresponding missing moments from the training set;
and (2-2) using a self-encoder model, retrieving the contaminant concentration data corresponding to the previous time and the next time of the missing time, and calculating the average value of the previous time and the next time as the contaminant concentration data of the missing time.
4. The method of claim 1, wherein in step (1), the retrieved monitoring data further comprises weather data; the meteorological data comprise wind speed, wind direction, air pressure, air temperature and humidity;
in the step (2), after the missing value interpolation processing, the meteorological data is screened first, and then the data normalization processing is carried out;
the method for screening the meteorological data comprises the following steps:
(2-A) mixing the meteorological data in the monitoring data obtained in the step (1) with PM 2.5 Carrying out correlation analysis on the concentration data and the air quality index to obtain a correlation coefficient;
(2-B) determining each of the weather data and the PM according to the correlation determination criterion 2.5 And (3) obtaining meteorological data which can be input as the combined model in the step (3) by correlation of the concentration data and the air quality index.
5. The method for predicting atmospheric pollution in an industrial park according to claim 4, wherein the correlation criterion is:
meteorological factors and PM are completed by utilizing correlation analysis tool 2.5 If the correlation coefficient is |<0.4, low correlation; if 0.4 is less than or equal to the absolute correlation coefficient<0.7, the linear correlation is significant; if 0.7 is less than or equal to 1, the correlation coefficient is highly correlated.
6. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in the step (2), the method for normalizing the data comprises:
and carrying out normalization processing on the processed data by using a mapmin max function, wherein the formula is as follows:
in the formula (1), x' represents a value of a certain monitoring data of each sample acquired by the monitoring station, min is a minimum value of a certain monitoring data in the sample, and max is a maximum value of a certain monitoring data in the sample.
7. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in step (3), LSTM initialization parameters in the SA-LSTM model are: the weight gradient learning rate is set to 0.001, tensor 10 is input, dimension 64 is input, and the iteration number is 100; the random seed is 42; setting a maximum tree depth of 6 by the XGBoost model, wherein the learning rate is 0.05, and fitting by using 100 trees; the random seed was 42.
8. An industrial park atmospheric pollution data predictive early warning system, comprising:
the data acquisition module is used for receiving monitoring data information of the environmental monitoring points;
the data prediction module predicts the pollutant monitoring data information acquired by the data acquisition module by using the prediction method of the atmospheric pollution of the industrial park according to any one of claims 1 to 7 to obtain a prediction result of the atmospheric pollution condition;
the data early warning module is used for carrying out early warning on the predicted air pollution condition at the next moment according to a preset air pollution condition threshold value and sending early warning information to the mobile terminal;
and the data visualization display module displays the prediction result of the data prediction module.
9. The industrial park atmospheric pollution data predictive early warning system of claim 8, wherein the monitoring data comprises concentration data of pollutants; the pollutants comprise CO and SO 2 、NO 2 、O 3 、PM 2.5 、PM 10 The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM 2.5 And an air quality index AQI.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311236270.8A CN117332815A (en) | 2023-09-22 | 2023-09-22 | Prediction method and prediction early warning system for atmospheric pollution of industrial park |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311236270.8A CN117332815A (en) | 2023-09-22 | 2023-09-22 | Prediction method and prediction early warning system for atmospheric pollution of industrial park |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117332815A true CN117332815A (en) | 2024-01-02 |
Family
ID=89276538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311236270.8A Pending CN117332815A (en) | 2023-09-22 | 2023-09-22 | Prediction method and prediction early warning system for atmospheric pollution of industrial park |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117332815A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117875576A (en) * | 2024-03-13 | 2024-04-12 | 四川国蓝中天环境科技集团有限公司 | Urban atmosphere pollution analysis method based on structured case library |
-
2023
- 2023-09-22 CN CN202311236270.8A patent/CN117332815A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117875576A (en) * | 2024-03-13 | 2024-04-12 | 四川国蓝中天环境科技集团有限公司 | Urban atmosphere pollution analysis method based on structured case library |
CN117875576B (en) * | 2024-03-13 | 2024-05-24 | 四川国蓝中天环境科技集团有限公司 | Urban atmosphere pollution analysis method based on structured case library |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363347B (en) | Method for predicting air quality based on neural network of decision tree index | |
CN115578015B (en) | Sewage treatment whole process supervision method, system and storage medium based on Internet of things | |
CN110555551B (en) | Air quality big data management method and system for smart city | |
CN106529081A (en) | PM2.5 real-time level prediction method and system based on neural net | |
CN112085241A (en) | Environment big data analysis and decision platform based on machine learning | |
CN109146161A (en) | Merge PM2.5 concentration prediction method of the stack from coding and support vector regression | |
CN107392368A (en) | Meteorological forecast-based office building dynamic heat load combined prediction method | |
CN111339092B (en) | Multi-scale air quality forecasting method based on deep learning | |
CN110346518B (en) | Traffic emission pollution visualization early warning method and system thereof | |
CN115759488A (en) | Carbon emission monitoring and early warning analysis system and method based on edge calculation | |
CN114858976A (en) | Intelligent analysis method and system for atmospheric quality of industrial park | |
CN117332815A (en) | Prediction method and prediction early warning system for atmospheric pollution of industrial park | |
CN113281229B (en) | Multi-model self-adaptive atmosphere PM based on small samples 2.5 Concentration prediction method | |
Cui et al. | Deep learning methods for atmospheric PM2. 5 prediction: A comparative study of transformer and CNN-LSTM-attention | |
CN117010915A (en) | Carbon emission target identification and monitoring system based on Internet of things technology | |
CN114741972A (en) | Construction method of seasonal prediction model of air pollutant concentration | |
CN115629160A (en) | Air pollutant concentration prediction method and system based on space-time diagram | |
CN112990531B (en) | Haze prediction method based on feature-enhanced ConvLSTM | |
CN117788218A (en) | Carbon emission evaluation method and system | |
CN116147712B (en) | Space-time restriction-free three-dimensional construction environment monitoring device and prediction method | |
CN117057668A (en) | Industrial pollutant emission prediction method based on deep learning model | |
CN117033923A (en) | Method and system for predicting crime quantity based on interpretable machine learning | |
CN115587650A (en) | Multi-target hybrid prediction method for normal atmospheric pollutants in medium-short term and time intervals | |
CN114611399A (en) | PM based on NGboost algorithm2.5Concentration long-time sequence prediction method | |
Hu et al. | The early warning model of dust concentration in smart construction sites based on long short term memory network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |