CN112732691A

CN112732691A - Atmospheric environment prediction method based on multiple model comparison

Info

Publication number: CN112732691A
Application number: CN202110017749.7A
Authority: CN
Inventors: 曹敏; 刘娇龙; 赵娜; 张叶; 刘斯扬; 聂永杰; 尹春林; 杨政; 肖华根; 廖斌; 胡昌斌; 韩彤; 魏龄
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-04-30

Abstract

The application provides an atmospheric environment prediction method based on multiple model comparison, which comprises the steps of obtaining the conventional pollutant concentration and meteorological data of a target city, and constructing a database; preprocessing a database; construction of a predicted PM for input factors from meteorological data₁₀Concentration Y₁The multivariate linear regression model of (1); adjusting the input factors of the multiple linear regression model, and constructing the predicted PM through a stepwise recursion mode₁₀Concentration Y₂Optimum linearity ofA regression model; training a BP neural network model according to a network structure, the concentration of the pretreated conventional pollutants and meteorological data; optimizing the threshold and the weight of the BP neural network model based on a genetic algorithm to obtain an optimal BP neural network model; comparing the four models, by PM₁₀Mean square error, PM₁₀Determining a final selected model by averaging the absolute error and the goodness of fit; optimal BP neural network parameters are obtained through selection, intersection and variable operation iterative evolution, accurate prediction results are obtained, and the prediction method is more suitable for medium and long term prediction of atmospheric pollutants.

Description

Atmospheric environment prediction method based on multiple model comparison

Technical Field

The application relates to the field of environmental monitoring and early warning data mining and analysis, in particular to an atmospheric environment prediction method based on multiple model comparison.

Background

With the aggravation of environmental pollution and the improvement of environmental awareness of people, in order to reduce the occurrence of atmospheric environmental pollution events, the monitoring of atmospheric environment is gradually dedicated, a large number of atmospheric detection systems are constructed, and a large number of historical monitoring data are accumulated.

The existing historical monitoring data is only used for generating real-time monitoring, daily report, monthly report and annual report, wherein the atmospheric conventional pollutant data comprises PM2.5, PM10, SO2, CO, O3 and NO2, and the meteorological data comprises humidity, air temperature, air speed, wind direction and air pressure; the value of the data is not only embodied by statistical data such as daily reports, monthly reports, annual reports and the like; however, with the development of air pollution and prevention and treatment research, the prediction of the atmospheric environment is also important,

disclosure of Invention

The application provides an atmospheric environment prediction method based on multiple model comparison, and aims to solve the technical problem of lack of existing environment monitoring and early warning data mining and analysis.

In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

an atmospheric environment prediction method based on multiple model comparisons is provided, and the method comprises the following steps:

acquiring the conventional pollutant concentration and meteorological data of a target city, and constructing a database of the conventional pollutant concentration and meteorological data;

preprocessing the conventional pollutant concentration and meteorological data in the database;

constructing a prediction PM for an input factor based on the meteorological data₁₀Concentration Y₁The multivariate linear regression model of (1); the input factors comprise air pressure, humidity, wind speed, wind direction and air temperature;

adjusting the input factors of the multiple linear regression model, and constructing a predicted PM through a stepwise recursion mode₁₀Concentration Y₂The optimal linear regression model of (1); the adjusted input factor comprises PM_2.5Temperature, O₃Wind speed, air pressure, humidity, season;

training a BP neural network model according to a network structure and the conventional pollutant concentration and meteorological data after pretreatment;

optimizing the threshold and the weight of the BP neural network model based on a genetic algorithm to obtain an optimal BP neural network model;

comparing the four models, by PM₁₀Mean square error, PM₁₀Determining a final selected model by averaging the absolute error and the goodness of fit;

wherein the normal pollutant concentration comprises PM_2.5，PM₁₀，SO₂，CO，O₃，NO₂(ii) a The meteorological data includes humidity, air temperature, wind speed, wind direction, and air pressure.

In one possible implementation, the preprocessing includes:

carrying out consistency check on the conventional pollutant concentration and meteorological data;

aiming at invalid data and missing data, processing by estimation, deletion, global variable filling or a random difference complementing method;

normalizing the out-of-range data to a [0,1] interval by normalization for the out-of-range data.

In one possible implementation, the normalization is obtained by the following formula:

in the formula (I), the compound is shown in the specification,

for normalized data, x_iAs raw data, x_maxIs the maximum value of the raw data, x_minIs the minimum of the raw data.

In one possible implementation, the construction of the predicted PM for the input factor based on the meteorological data is performed₁₀The multiple linear regression model of concentration is:

Y₁＝-3.81X₁+2.213X₂-55.100X₃-0.212X₄-1.302X₅+398.112

in the formula, Y₁Predicting PM for multiple linear regression models₁₀Concentration; x₁Is the air pressure; x₂Is humidity; x₃Is the wind speed; x₄Is the wind direction; x₅Is the air temperature.

In one possible implementation, the multiple linear regression model is subjected to an F-test;

when the significance of the F-test is greater than or equal to 0.00 and less than 0.01, the multiple linear regression model is established.

In one possible implementation, the optimal linear regression model:

Y₂＝30.231X′₁+19.629X′₂-0.312X′₃+8.531X′₄+0.891X′₅+5.121X′₆+10.031X′₇-90.132

in the formula, Y₂Predicting PM for optimal linear regression model₁₀Concentration; x'₁Is PM_2.5；X′₂Is the air temperature; x'₃Is O₃；X′₄Is the wind speed; x'₅Is the air pressure; x'₆Is humidity; x'₇Is the season.

In one possible implementation, constructing the BP neural network model includes:

taking the concentration of the pretreated conventional pollutants and meteorological data as input;

determining a network structure; the network structure comprises a network layer number, an input layer node number, an output layer node number, an activation function, a training method and a training parameter.

In one possible implementation, the cross probability in the genetic algorithm decreases as the fitness function increases, and the mutation probability increases as the fitness function increases.

In one possible implementation, the PM₁₀Mean square error MSE, PM₁₀Mean absolute error MAE and goodness of fit R²By the following formula:

in the formula, Y_actIs an actual value, Y_preAs a predicted value, Y_meanIs the average of the actual values.

The application provides an atmospheric environment prediction method based on multiple model comparison, which comprises the steps of obtaining the conventional pollutant concentration and meteorological data of a target city, and constructing a database of the conventional pollutant concentration and meteorological data; preprocessing the conventional pollutant concentration and meteorological data in the database; constructing a prediction PM for an input factor based on the meteorological data₁₀Concentration Y₁The multivariate linear regression model of (1); the input factors comprise air pressure, humidity, wind speed, wind direction and air temperature; adjusting the input factors of the multiple linear regression model, and constructing a predicted PM through a stepwise recursion mode₁₀Concentration Y₂The optimal linear regression model of (1); the adjusted input factor comprises PM_2.5Temperature, O₃Wind speed, air pressure, humidity, season; training a BP neural network model according to a network structure and the conventional pollutant concentration and meteorological data after pretreatment; optimizing the threshold and the weight of the BP neural network model based on a genetic algorithm to obtain an optimal BP neural network model; comparing the four models, by PM₁₀Mean square error, PM₁₀Determining a final selected model by averaging the absolute error and the goodness of fit; through selection, intersection and change, optimal BP neural network parameters are obtained, accurate prediction results are obtained by utilizing improved neural network prediction, and the prediction method is more suitable for medium-term and long-term prediction of atmospheric pollutants.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of an atmospheric environment prediction method based on multiple model comparisons according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for atmospheric environment prediction based on multiple model comparisons according to an embodiment of the present disclosure for constructing a multiple linear regression model;

FIG. 3 is a flowchart of an optimal linear regression model constructed in the atmospheric environment prediction method based on multiple model comparisons according to the embodiment of the present application;

FIG. 4 is a flowchart of a BP neural network model training method in an atmospheric environment prediction method based on multi-model comparison according to an embodiment of the present application;

fig. 5 is a flowchart of an optimal BP neural network model in an atmospheric environment prediction method based on multiple model comparisons according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The present application is described in further detail below with reference to the attached drawing figures:

the embodiment of the application provides an atmospheric environment prediction method based on multiple model comparisons, and as shown in fig. 1, the method includes the following steps:

s101, acquiring conventional pollutant concentration and meteorological data of a target city, and constructing a database of the conventional pollutant concentration and meteorological data; wherein the normal pollutant concentration comprises PM_2.5，PM₁₀，SO₂，CO，O₃，NO₂(ii) a The meteorological data includes humidity, air temperature, wind speed, wind direction, and air pressure.

S102, preprocessing the conventional pollutant concentration and meteorological data in the database; the pretreatment comprises the following steps: carrying out consistency check on the conventional pollutant concentration and meteorological data; aiming at invalid data and missing data, processing by estimation, deletion, global variable filling or a random difference complementing method; normalizing the out-of-range data to a [0,1] interval by normalization for the out-of-range data. The normalization is obtained by the following formula:

in the formula (I), the compound is shown in the specification,

for normalized data, x_iAs raw data, x_maxIs the maximum value of the raw data, x_minIs the most of the original dataA small value. The normalized input and output values will fall to 0,1]Interval, finally using formula

Recalculated back to the true output value.

S103, constructing a prediction PM for an input factor according to the meteorological data₁₀Concentration Y₁The multivariate linear regression model of (1); the input factors comprise air pressure, humidity, wind speed, wind direction and air temperature; performing F test on the multiple linear regression model; as shown in table 1, when the significance of the F-test is 0.00 or more and less than 0.01, the multiple linear regression model has significant statistical significance.

TABLE 1

Air pressure, air temperature, wind speed, wind direction to PM₁₀Concentration Y₁With a negative effect, humidity on PM₁₀Concentration Y₁Having a positive impact, said constructing a predicted PM for an input factor from said meteorological data₁₀The multiple linear regression model of concentration is:

Y₁＝-3.81X₁+2.213X₂-55.100X₃-0.212X₄-1.302X₅+398.112

in the formula, Y₁Predicting PM for multiple linear regression models₁₀Concentration; x₁Is the air pressure; x₂Is humidity; x₃Is the wind speed; x₄Is the wind direction; x₅Is the air temperature. Comparing the actual value in the database with the predicted PM₁₀And (6) comparing the concentrations. The degree of fit of the multiple linear regression model is not high.

TABLE 2

S104, adjusting the input factors of the multiple linear regression model, and constructing and predicting PM through a stepwise recursion mode₁₀Concentration Y₂The optimal linear regression model of (1); the adjusted input factor comprises PM_2.5Temperature, O₃Wind speed, barometric pressure, humidity, season, adjusted as in table 3, the optimal linear regression model:

TABLE 3

From the result, the model introducing other pollutant concentrations, seasonal factors and meteorological factors has obviously improved fitting goodness compared with the model introducing meteorological factors only. The fitting degree of the optimized multiple linear regression model is as high as 0.821.

S105, training a BP neural network model according to a network structure and the preprocessed conventional pollutant concentration and meteorological data; the construction of the BP neural network model comprises the following steps: taking the concentration of the pretreated conventional pollutants and meteorological data as input; determining a network structure; the network structure comprises a network layer number, an input layer node number, an output layer node number, an activation function, a training method and a training parameter. The model uses a three-layer neural network with a hidden layer to make a prediction of the atmosphere. The number of input nodes is 11, the number of output nodes is 1, the hidden layer of the neural network uses a Sigmoid activation function, and the output layer uses a thread activation function. Before training the neural network, parameters such as initial weight and learning efficiency are determined, as shown in table 4.

TABLE 4

When the BP neural network structure is 11-6-1, the training function is thingdx, the training times are 5000, the training target is 0.005, the training step length is 25 and the learning rate is 0.01, the change trend of the future atmospheric pollutant concentration can be predicted, and the future big data pollutant concentration can be relatively accurately predicted. The goodness of fit of the traditional BP neural network is improved by 0.03 compared with the improved multiple linear regression model. The BP neural network is better than the improved multiple linear regression model in prediction effect on the whole. However, the BP neural network has the defects of insufficient global searching capability, easy falling into local optimum and slow training speed.

S106, optimizing the threshold and the weight of the BP neural network model based on a genetic algorithm to obtain an optimal BP neural network model; in order to avoid the early trapping of the BP neural network based on the genetic algorithm into the local optimum, the cross probability and the mutation probability in the genetic algorithm are changed, the cross probability and the mutation probability are changed into fixed values, the cross probability is gradually reduced along with the increase of the fitness function, and the mutation probability is gradually increased along with the increase of the fitness function.

In the neural network, input data is 11 layers, output data is 1 layer, and when the number of hidden nodes is 10, the prediction effect of the network is optimal. According to the network structure, the total weight of the optimized neural network can be calculated to be 120, and 11 thresholds are calculated, so that the code length of an individual in the genetic algorithm is determined to be 131. The population rule in the genetic algorithm is set to be 20, the evolution times is set to be 50, the cross probability is 0.2, the mutation probability is 0.1, and the absolute value of the error between the actual output value and the expected output value of the network is used as the fitness value of an individual, as shown in table 6.

TABLE 6

The process is as follows: randomly initializing a population; calculating population fitness and finding out optimal individuals; carrying out selection, crossing and mutation operations; judging whether the evolution is finished or not, if not, returning to the second step; and finally, the found optimal individual is given to a BP neural network, and the network is used for prediction. The goodness of fit of the obtained optimal BP neural network model is 0.886.

S107, comparing the four models, and passing PM₁₀Mean square error, PM₁₀Determining a final selected model by averaging the absolute error and the goodness of fit; the PM₁₀Mean square error MSE, PM₁₀Mean absolute error MAE and goodness of fit R²By the following formula:

in the formula, Y_actIs an actual value, Y_preAs a predicted value, Y_meanIs the average of the actual values. The predicted effects of the four prediction models were compared and analyzed as shown in table 7.

TABLE 7

The optimal linear regression model, the BP neural network and the optimal BP neural network model in the four prediction models can relatively accurately predict the concentration of future atmospheric pollution, and the multiple linear regression model can only roughly predict the approximate trend and cannot mutually predict the approximate trendThe future weather pollutant concentration can be accurately predicted. The optimal BP neural network model is used for medium and long term prediction, and can accurately predict the trend turning point in the medium and long term period range. The best prediction effect is obtained by the optimal BP neural network model shown in the table 7. Wherein the optimal BP neural network model PM₁₀The predicted mean square error, mean error and goodness-of-fit are respectively lower than 0.062,3.42 and 0.031 of the BP neural network, lower than 0.065,6.8 and 0.042 of the optimal linear regression model, and lower than 0.376,24.91 and 0.114 of the linear regression model.

The above-mentioned contents are only for explaining the technical idea of the present application, and the protection scope of the present application is not limited thereby, and any modification made on the basis of the technical idea presented in the present application falls within the protection scope of the claims of the present application.

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments have been discussed in the foregoing disclosure by way of example, it should be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.

Claims

1. An atmospheric environment prediction method based on multiple model comparisons is characterized by comprising the following steps:

2. The atmospheric environment prediction method based on multiple model comparisons according to claim 1, characterized in that the preprocessing comprises:

3. The atmospheric environment prediction method based on multiple model comparisons according to claim 2, characterized in that the normalization is obtained by the following formula:

in the formula (I), the compound is shown in the specification,

4. The multi-model-comparison-based atmospheric environment prediction method of claim 1, wherein the predicted PM is constructed for the input factors according to the meteorological data₁₀The multiple linear regression model of concentration is:

Y₁＝-3.81X₁+2.213X₂-55.100X₃-0.212X₄-1.302X₅+398.112

5. The atmospheric environment prediction method based on multiple model comparisons according to claim 4, characterized in that the multiple linear regression model is subjected to an F test;

6. The atmospheric environment prediction method based on multiple model comparisons according to claim 1, characterized in that the optimal linear regression model:

7. The atmospheric environment prediction method based on multiple model comparisons according to claim 1, wherein constructing the BP neural network model comprises:

8. The atmospheric environment prediction method based on multiple model comparisons according to claim 1, wherein the cross probability in the genetic algorithm decreases with the increase of the fitness function, and the mutation probability increases with the increase of the fitness function.

9. The atmospheric environment prediction method based on multiple model comparisons according to claim 1, wherein the PM is₁₀Mean square error MSE, PM₁₀Mean absolute error MAE and goodness of fit R²By the following formula: