CN117688501A

CN117688501A - Error correction method for air quality prediction system

Info

Publication number: CN117688501A
Application number: CN202410152249.8A
Authority: CN
Inventors: 唐易天晴; 周德荣; 江飞; 刘强
Original assignee: Nanjing Chuanglan Technology Co ltd
Current assignee: Nanjing Chuanglan Technology Co ltd
Priority date: 2024-02-03
Filing date: 2024-02-03
Publication date: 2024-03-12

Abstract

The invention relates to the technical field of air quality prediction, and particularly discloses a method for correcting errors of an air quality prediction system, which comprises the following steps: s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result; s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved; s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness; according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.

Description

Error correction method for air quality prediction system

Technical Field

The invention belongs to the technical field of air quality prediction, and particularly relates to an error correcting method of an air quality prediction system.

Background

The air quality prediction service can enable an environmental management department to know the future change trend of air pollution more accurately so as to take targeted policy measures and ensure the health and safety of the masses. The development of the environmental air quality forecasting work is an important technical means for ensuring timely and proper coping with heavy pollution weather, and has guiding significance for combined emission reduction of regional atmosphere pollution. The existing air quality forecasting method mainly comprises a numerical analysis method and a statistical analysis method. However, numerical forecasting methods generally require accurate input data and expensive computational resources to make air quality predictions, while statistical forecasting methods have less accuracy for non-linearly varying pollutant concentration predictions. In the case where an immediate accurate prediction is required, using the existing air quality prediction model is very challenging.

The existing air quality forecasting system has the defects that the model is greatly influenced by discrete data during forecasting, and the forecasting result is easy to send errors, so that the forecasting accuracy of the system is low, and the use experience of people is adversely affected.

Disclosure of Invention

The present invention is directed to a method for correcting errors in an air quality prediction system, so as to solve the problems set forth in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a method of correcting errors in an air quality prediction system, comprising:

s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;

s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;

s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;

s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.

Preferably, the random forest performs importance measurement on the influence factors:

given that there are influencing factors (variables)，，，Requiring calculation ofOf individual influencing factorsScore statistics, variablesFor score statisticsRepresenting statisticsRepresent the firstThe average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:

in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:

in (1) the->For the sample at node->Probability estimation values belonging to any class;

variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:

in (1) the->And->Respectively represent by node->Gini index of two new nodes split;

if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:

variable->Gini importance in RF is defined as:

in (1) the->Is the number of classification trees in RF.

Preferably, the isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:

1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;

2) Randomly assign oneA dimension for randomly generating a cutting point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);

3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;

4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.

Preferably, in the step S3, a weighted prediction control law is used instead of a one-step prediction control law, that is, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment in the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.

Preferably, the prediction model in S4 includes a numerical prediction model and a machine learning model, the weighted average method in S4 performs weighted average on output results of multiple models of the same type, and assigns different weights to models with different effects, and the result in S4 may be calculated in different manners according to different choices of the prediction model (such as a factor score method, a structural equation model, an entropy value method, etc.).

Preferably, the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile, the data safety protection module is connected with the firewall.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

fig. 2 is a schematic diagram of a random forest algorithm of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples:

referring to fig. 1-2, a method for correcting errors in an air quality prediction system includes:

The random forest carries out importance measurement on the influence factors:

in (1) the->And->Respectively represent by nodes/>Gini index of two new nodes split;

variable->Gini importance in RF is defined as:

in (1) the->Is the number of classification trees in RF.

An isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:

2) Randomly assigning a dimension, and randomly generating a cut point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);

And S3, replacing the one-step predictive control law by a weighted predictive control law, namely, obtaining a control signal actually applied to the system by weighted average summation of the one-step control quantity at the current moment and the predictive value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of error control signals is reduced.

The prediction model in S4 comprises a prediction model of a numerical method and a model in machine learning, the weighted average method in S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in S4 (such as a factor score method, a structural equation model, an entropy value method and the like).

The air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.

According to the method, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, input variables with small influence are provided for the prediction model, influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method of correcting errors in an air quality prediction system, comprising:

2. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the random forest performs importance measurement on the influence factors:

given that there are influencing factors (variables)，/>，/>，/>It is necessary to calculate +.>Individual influencing factors->Score statistic, variable->Score statistics>Representing, statistics->Indicate->The average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:

variable->Gini importance in RF is defined as:

in (1) the->Is the number of classification trees in RF.

3. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the isolated forest algorithm idea is as follows: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:

4. A method of correcting errors in an air quality prediction system according to claim 1, wherein: in the step S3, a weighted prediction control law is used to replace a one-step prediction control law, namely, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.

5. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the prediction model in the S4 comprises a prediction model of a numerical method and a model in the aspect of machine learning, the weighted average method in the S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to the models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in the S4.

6. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.