CN117332815A

CN117332815A - Prediction method and prediction early warning system for atmospheric pollution of industrial park

Info

Publication number: CN117332815A
Application number: CN202311236270.8A
Authority: CN
Inventors: 王奇; 赵敏; 柯强; 黄波; 王传花; 于恒国
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2024-01-02

Abstract

The invention provides a prediction method and a prediction early warning system for atmospheric pollution of an industrial park, wherein the method comprises the following steps: selecting the monitoring data of the environment monitoring points in the research area from year to year as a data source, performing data set missing value interpolation, meteorological factor screening and data normalization reconstruction on the data, and inputting the data into an SA-LSTM model and an XGBoost model for training to obtain a combined model; and constructing an industrial park atmospheric pollution data prediction and early warning system, and predicting and releasing pollutant monitoring data information by using modules in the system. The prediction method and the prediction early warning system for the atmospheric pollution of the industrial park can accurately and timely monitor and predict the atmospheric pollution index in the industrial park, provide scientific basis for intelligent management and control of the atmospheric pollution of the industrial park, and can greatly improve the risk monitoring, rapid early warning and emergency response capability of the industrial park.

Description

Prediction method and prediction early warning system for atmospheric pollution of industrial park

Technical Field

The invention relates to the technical field of monitoring of atmospheric pollutants in an industrial park, in particular to a prediction method and a prediction early warning system for atmospheric pollution in the industrial park.

Background

Atmospheric pollutants discharged in the production process of the industrial park is an important environmental pollution source, the industrial park is a main battlefield for industrial atmospheric pollution control, the industrial park atmospheric key pollutants are predicted and analyzed, regional atmospheric environment quality regulation and control technical research is carried out, the risk monitoring, rapid early warning and emergency response capability of the industrial park are greatly improved, and the regional ecological environment safety and the life and property safety of people are effectively ensured. To control the emission problem of atmospheric pollution, accurate prediction is a basis and premise. Reasonable prediction method and technology are of great significance in formulating and perfecting atmospheric environment antifouling treatment policy.

In the past, the management and control of the park is focused on real-time monitoring of the atmospheric pollutants of the park, but the research on scientific prediction of the characteristic pollutants of the industrial park is less, the method is based on the industrial park in the Shanzhou market, analyzes the history data of the pollutants of the industrial park, utilizes the related advanced model to scientifically predict the atmospheric pollutants of the industrial park, and provides scientific basis for intelligent management and control of the atmospheric pollutants of the industrial park.

Disclosure of Invention

The invention provides a prediction method and a prediction and early warning system for atmospheric pollution of an industrial park.

The specific technical scheme is as follows:

the industrial park atmospheric pollution prediction method comprises the following steps:

(1) Environmental monitoring points in the industrial park are selected, and monitoring data of the environmental monitoring points in the past year are called as a training set;

(2) Sequentially performing missing value interpolation processing and data normalization processing on the training set in the step (1) to obtain a processed data set;

(3) Inputting the data set processed in the step (2) into an SA-LSTM model and an XGBoost model respectively for model training, and obtaining a combined prediction model of the SA-LSTM model and the XGBoost by using a reciprocal variance method;

(4) And retrieving monitoring data of the environment monitoring point at the previous moment, and respectively inputting the monitoring data into the combined prediction model to obtain the predicted atmospheric pollution condition at the next moment.

AQI in this context represents the air quality index and the environmental monitoring point is a government-set environmental monitoring point.

Further, in step (1), the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO ₂ 、NO ₂ 、O ₃ 、PM _2.5 、PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM _2.5 And an air quality index AQI.

Further, in the step (2), the method of the missing value interpolation processing is as follows:

(2-1) finding missing contaminant concentration data and corresponding missing moments from the training set;

and (2-2) using a self-encoder model, retrieving the contaminant concentration data corresponding to the previous time and the next time of the missing time, and calculating the average value of the previous time and the next time as the contaminant concentration data of the missing time.

Further, in the step (1), the retrieved monitoring data further includes weather data; the meteorological data comprise wind speed, wind direction, air pressure, air temperature and humidity;

in the step (2), after the missing value interpolation processing, the meteorological data is screened first, and then the data normalization processing is carried out;

the method for screening the meteorological data comprises the following steps:

(2-A) mixing the meteorological data in the monitoring data obtained in the step (1) with PM _2.5 Carrying out correlation analysis on the concentration data and the air quality index to obtain a correlation coefficient;

(2-B) determining each of the weather data and the PM according to the correlation determination criterion _2.5 And (3) obtaining meteorological data which can be input as the combined model in the step (3) by correlation of the concentration data and the air quality index.

Further, the correlation judgment standard is that the weather factor and PM are completed by using a correlation analysis tool _2.5 Is a correlation analysis of the correlation coefficient representing whether the correlation between two variables is significant if the correlation coefficient is<0.4, low correlation; if 0.4 is less than or equal to the absolute correlation coefficient<0.7, the linear correlation is significant; if the correlation coefficient is more than or equal to 0.7 and less than or equal to 1, the correlation is high;

TABLE 1PM _2.5 Correlation of AQI and meteorological factors

Correlation coefficient table shows, and PM _2.5 The parameter with the maximum correlation is AQI, which reaches 0.931, and the correlation is ranked as AQI>Air temperature>Humidity of the water>Wind direction>Wind speed>Barometric pressure, overall, deer city light industrial park PM _2.5 AQI and meteorological factors are all significant, but PM _2.5 AQI has low correlation with wind speed, wind direction and air pressure, significant correlation with air temperature and humidity, PM _2.5 And the AQI and meteorological factor correlation diagram is shown in the attached drawing; displaying four meteorological factors of wind speed, air pressure, temperature and humidity according to a correlation coefficient table to serve as input data of a prediction template;

further, in the step (2), the data normalization processing method includes:

and carrying out normalization processing on the processed data by using a mapmin max function, wherein a normalization formula is as follows:

in the formula (1), x' represents the value of the single data of each sample characteristic, min is the minimum value of the sample characteristic data, and max is the maximum value of the sample characteristic data.

Further, the SA-LSTM model in the combined model comprises the following four components:

i) Input layer: the method comprises the steps of taking time as a sequence and inputting monitoring data of environmental monitoring points;

ii) LSTM layer: after the LSTM layer, the memory and the forget are coded, an output vector of the hidden layer is obtained;

iii) Self-care layer: the self-attention mechanism generates a weight vector, weights the hidden states of all time steps, and focuses attention on a more important part in the whole hidden state information sequence;

iv) output layer: the sequence-level feature vector is finally used for time sequence data analysis and prediction;

further, training of the XGBoost model includes the steps of:

dividing the preprocessed data set obtained in the step (2) into a training set and a testing set, extracting characteristic values of the training set, inputting the characteristic values into the XGBoost classification model, training to obtain the XGBoost classification model, testing by using the testing set, and obtaining a final model after the testing;

further, in the step (3), LSTM initialization parameters in the SA-LSTM model are: the weight gradient learning rate is set to 0.001, tensor 10 is input, dimension 64 is input, and the iteration number is 100; the random seed is 42; setting a maximum tree depth of 6 by the XGBoost model, wherein the learning rate is 0.05, and fitting by using 100 trees; the random seed is 42;

further, selecting a root mean square error, an average absolute error and an average absolute percentage error as evaluation indexes of the model;

the root mean square error formula is as follows:

equation (2), wherein the true value is a predicted value, and the average of the values is taken by subtracting the values and summing the squares, and the closer the predicted value and the true value are to the RMSE, the smaller the value is; the larger the error, the larger the value of RMSE; mse represents the root mean square value, m represents the number of observations, y _i Representing the value of the observation,representing a true value;

the mean absolute error formula is as follows:

equation (3), wherein the real value is represented as a predicted value, and the average is obtained by subtracting the sum of squares, so that the value of MAE is smaller as the predicted value and the real value are closer; the larger the error, the larger the MAE value; m represents the number of observations, y _i Representing the value of the observation,representing a true value;

the mean absolute percentage error formula is as follows:

equation (4), which is a true value and represents a predicted value, the range is [0, + ], MAPE represents a perfect model when 0%, and an inferior model when MAPE is greater than 100%; n represents the number of observations, y _i Representing the value of the observation,representing a true value;

further, combining the SA-LSTM after training with the XGBoost model, wherein the combination process is to use a reciprocal variance method for the prediction results of the SA-LSTM model and the XGBoost model to obtain the prediction result of the final combination model, and the specific method of the reciprocal variance method is as follows:

b = predictive model SA-LSTM predicted value;

a = a value of prediction model XGBoost prediction;

e ₁ =variance of prediction model SA-LSTM prediction value;

e ₂ variance of =prediction model XGBoost prediction value;

w ₁ ＝(1/e ₁ )/(1/e ₁ +1/e ₂ ) The weight of the predicted value of the SA-LSTM model;

w ₂ ＝(1/e ₂ )/(1/e ₁ +1/e ₂ ) The weight of the XGBoost model predicted value;

final predicted value x=w ₁ ×a+w ₂ ×b。

The invention also provides an industrial park atmospheric pollution data prediction and early warning system, which comprises:

the data acquisition module is used for receiving monitoring data information of the environmental monitoring points;

the data prediction module predicts the pollutant monitoring data information acquired by the data acquisition module by using the prediction method of the atmospheric pollution of the industrial park to obtain a prediction result of the atmospheric pollution condition;

the data early warning module is used for carrying out early warning on the predicted air pollution condition at the next moment according to a preset air pollution condition threshold value and sending early warning information to the mobile terminal;

and the data visualization display module displays the prediction result of the data prediction module.

Further, the monitoring data includes concentration data of the contaminant; the pollutants comprise CO and SO ₂ 、NO ₂ 、O ₃ 、PM _2.5 、PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM _2.5 And an air quality index AQI.

Further, the operation steps of the industrial park atmospheric pollution data prediction and early warning system comprise:

(1) PM (particulate matter) in three hours of national control station of park to be predicted is obtained by selecting park to be predicted and pollutant types through system _2.5 Monitoring data in real time on concentration and air quality index;

(2) Inputting the real-time monitoring data into a combined model of SA-LSTM and XGBoost for prediction to obtain a prediction result;

(3) When the air quality index data in the prediction result is more than 180, the prediction is finished, and the platform issues early warning; returning to the step (1) to repeat the steps (2) - (3) when the air quality index data is not abnormal;

compared with the prior art, the invention has the following beneficial effects:

(1) The invention provides the prediction method and the prediction early warning system for the atmospheric pollution of the industrial park, which can realize the real-time monitoring and prediction of the atmospheric pollutants, and can issue early warning according to the prediction result to timely prevent and treat the park pollution;

(2) The risk monitoring, rapid early warning and emergency response capability of the industrial park can be greatly improved;

(3) The data of the pollutants can be displayed more conveniently and effectively through the combination of the platform and the model.

Drawings

FIG. 1 is a flow chart of a method for predicting atmospheric pollution in an industrial park according to example 1.

FIG. 2 is PM in example 1 _2.5 And AQI and meteorological factor correlation graphs, wherein WindSpeed represents wind speed;

WindDirection represents wind direction; air pressure represents air pressure; temperature represents Temperature; humidi represents Humidity.

FIG. 3 is a diagram of the SA-LSTM structure of example 1, wherein LSTM Layer represents the LSTM Layer; the Attention Layer represents a self-Attention Layer; fully Connectrd Layer represents a fully attached layer; x is X ₁ ～X _t Representing input data; y is Y _t Representing the output data.

FIG. 4 is a graph showing the comparison of predicted values and actual values of LSTM and SA-LSTM network models in example 1, wherein the upper column LSTM predicted results is the result of comparing the predicted values and actual values of the LSTM network models, the predicted_LSTM is the predicted values of the LSTM network models, and the y_true is the actual value; the following SA-LSTM predicted results is the comparison of the SA-LSTM model prediction result with the true value, the prediction_sa-LSTM is the SA-LSTM model prediction result, and the y_true is the true value.

Fig. 5 is a graph of the predicted value and the actual value of the XGBoost model in example 1, wherein XGBoost predicted results is the result of comparing the predicted value and the actual value of the XGBoost model, the predicted_xgboost is the predicted value of the XGBoost model, and the y_true is the actual value.

FIG. 6 is a graph showing the comparison of the predicted value and the actual value of the combined model in example 1, wherein Mix-Model predicted results is the comparison of the predicted value and the actual value of the combined model, and the predicted_mix-model is the predicted value and y_true is the actual value of the combined model.

Fig. 7 is a schematic exploded view of a platform function module in embodiment 1.

Detailed Description

The invention will be further described with reference to the following examples, which are given by way of illustration only, but the scope of the invention is not limited thereto.

Example 1

AQI in this context stands for air quality index.

the method for screening the meteorological data comprises the following steps:

TABLE 2PM _2.5 Correlation of AQI and meteorological factors

Correlation coefficient table shows, and PM _2.5 The parameter with the maximum correlation is AQI, which reaches 0.931, and the correlation is ranked as AQI>Air temperature>Humidity of the water>Wind direction>Wind speed>Barometric pressure, overall, deer city light industrial park PM _2.5 AQI and meteorological factors are all significant, but PM _2.5 AQI has low correlation with wind speed, wind direction and air pressure, significant correlation with air temperature and humidity, PM _2.5 And the correlation diagram of the AQI and the meteorological factors is shown in figure 2; displaying four meteorological factors of wind speed, air pressure, temperature and humidity according to a correlation coefficient table to serve as input data of a prediction template;

further, in the step (2), the data normalization processing method includes:

Further, the SA-LSTM model in the combined model comprises the following four components, and the specific reference is made to fig. 4:

further, training of the XGBoost model includes the steps of:

the root mean square error formula is as follows:

the mean absolute error formula is as follows:

the mean absolute percentage error formula is as follows:

b = predictive model SA-LSTM predicted value

a = value of prediction model XGBoost prediction

e ₁ Variance of prediction value of SA-LSTM of prediction model

e ₂ Variance of =prediction model XGBoost prediction value

w ₁ ＝(1/e ₁ )/(1/e ₁ +1/e ₂ ) Weights for SA-LSTM model predictions

w ₂ ＝(1/e ₂ )/(1/e ₁ +1/e ₂ ) Weights for XGBoost model predictions

Final predicted value x=w ₁ ×a+w ₂ ×b。

The predicted results using LSTM model, SA-LSTM model, XGBoost model and combined model are compared with the actual values, as shown in figures 4, 5 and 6.

(3) When the air quality index data in the prediction result is more than 180, the prediction is finished, and the platform issues early warning; and (3) returning to the step (1) to repeat the steps (2) - (3) when the air quality index data is not abnormal.

Claims

1. The industrial park atmospheric pollution prediction method is characterized by comprising the following steps of:

2. The method of claim 1, wherein in step (1), the monitoring data comprises concentration data of contaminants; the pollutants comprise CO and SO ₂ 、NO ₂ 、O ₃ 、PM _2.5 、PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM _2.5 And an air quality index AQI.

3. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in the step (2), the method for performing the missing value interpolation process comprises:

4. The method of claim 1, wherein in step (1), the retrieved monitoring data further comprises weather data; the meteorological data comprise wind speed, wind direction, air pressure, air temperature and humidity;

the method for screening the meteorological data comprises the following steps:

5. The method for predicting atmospheric pollution in an industrial park according to claim 4, wherein the correlation criterion is:

meteorological factors and PM are completed by utilizing correlation analysis tool _2.5 If the correlation coefficient is |<0.4, low correlation; if 0.4 is less than or equal to the absolute correlation coefficient<0.7, the linear correlation is significant; if 0.7 is less than or equal to 1, the correlation coefficient is highly correlated.

6. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in the step (2), the method for normalizing the data comprises:

and carrying out normalization processing on the processed data by using a mapmin max function, wherein the formula is as follows:

in the formula (1), x' represents a value of a certain monitoring data of each sample acquired by the monitoring station, min is a minimum value of a certain monitoring data in the sample, and max is a maximum value of a certain monitoring data in the sample.

7. The method for predicting atmospheric pollution in an industrial park according to claim 1, wherein in step (3), LSTM initialization parameters in the SA-LSTM model are: the weight gradient learning rate is set to 0.001, tensor 10 is input, dimension 64 is input, and the iteration number is 100; the random seed is 42; setting a maximum tree depth of 6 by the XGBoost model, wherein the learning rate is 0.05, and fitting by using 100 trees; the random seed was 42.

8. An industrial park atmospheric pollution data predictive early warning system, comprising:

the data prediction module predicts the pollutant monitoring data information acquired by the data acquisition module by using the prediction method of the atmospheric pollution of the industrial park according to any one of claims 1 to 7 to obtain a prediction result of the atmospheric pollution condition;

9. The industrial park atmospheric pollution data predictive early warning system of claim 8, wherein the monitoring data comprises concentration data of pollutants; the pollutants comprise CO and SO ₂ 、NO ₂ 、O ₃ 、PM _2.5 、PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The index of the atmospheric pollution condition is PM _2.5 And an air quality index AQI.