CN117688501A - Error correction method for air quality prediction system - Google Patents

Error correction method for air quality prediction system Download PDF

Info

Publication number
CN117688501A
CN117688501A CN202410152249.8A CN202410152249A CN117688501A CN 117688501 A CN117688501 A CN 117688501A CN 202410152249 A CN202410152249 A CN 202410152249A CN 117688501 A CN117688501 A CN 117688501A
Authority
CN
China
Prior art keywords
prediction
data
air quality
node
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410152249.8A
Other languages
Chinese (zh)
Inventor
唐易天晴
周德荣
江飞
刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Chuanglan Technology Co ltd
Original Assignee
Nanjing Chuanglan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Chuanglan Technology Co ltd filed Critical Nanjing Chuanglan Technology Co ltd
Priority to CN202410152249.8A priority Critical patent/CN117688501A/en
Publication of CN117688501A publication Critical patent/CN117688501A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to the technical field of air quality prediction, and particularly discloses a method for correcting errors of an air quality prediction system, which comprises the following steps: s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result; s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved; s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness; according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.

Description

Error correction method for air quality prediction system
Technical Field
The invention belongs to the technical field of air quality prediction, and particularly relates to an error correcting method of an air quality prediction system.
Background
The air quality prediction service can enable an environmental management department to know the future change trend of air pollution more accurately so as to take targeted policy measures and ensure the health and safety of the masses. The development of the environmental air quality forecasting work is an important technical means for ensuring timely and proper coping with heavy pollution weather, and has guiding significance for combined emission reduction of regional atmosphere pollution. The existing air quality forecasting method mainly comprises a numerical analysis method and a statistical analysis method. However, numerical forecasting methods generally require accurate input data and expensive computational resources to make air quality predictions, while statistical forecasting methods have less accuracy for non-linearly varying pollutant concentration predictions. In the case where an immediate accurate prediction is required, using the existing air quality prediction model is very challenging.
The existing air quality forecasting system has the defects that the model is greatly influenced by discrete data during forecasting, and the forecasting result is easy to send errors, so that the forecasting accuracy of the system is low, and the use experience of people is adversely affected.
Disclosure of Invention
The present invention is directed to a method for correcting errors in an air quality prediction system, so as to solve the problems set forth in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a method of correcting errors in an air quality prediction system, comprising:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
Preferably, the random forest performs importance measurement on the influence factors:
given that there are influencing factors (variables)Requiring calculation ofOf individual influencing factorsScore statistics, variablesFor score statisticsRepresenting statisticsRepresent the firstThe average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by node->Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
Preferably, the isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assign oneA dimension for randomly generating a cutting point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
Preferably, in the step S3, a weighted prediction control law is used instead of a one-step prediction control law, that is, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment in the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.
Preferably, the prediction model in S4 includes a numerical prediction model and a machine learning model, the weighted average method in S4 performs weighted average on output results of multiple models of the same type, and assigns different weights to models with different effects, and the result in S4 may be calculated in different manners according to different choices of the prediction model (such as a factor score method, a structural equation model, an entropy value method, etc.).
Preferably, the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile, the data safety protection module is connected with the firewall.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
fig. 2 is a schematic diagram of a random forest algorithm of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
referring to fig. 1-2, a method for correcting errors in an air quality prediction system includes:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
The random forest carries out importance measurement on the influence factors:
given that there are influencing factors (variables)Requiring calculation ofOf individual influencing factorsScore statistics, variablesFor score statisticsRepresenting statisticsRepresent the firstThe average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by nodes/>Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
An isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assigning a dimension, and randomly generating a cut point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
And S3, replacing the one-step predictive control law by a weighted predictive control law, namely, obtaining a control signal actually applied to the system by weighted average summation of the one-step control quantity at the current moment and the predictive value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of error control signals is reduced.
The prediction model in S4 comprises a prediction model of a numerical method and a model in machine learning, the weighted average method in S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in S4 (such as a factor score method, a structural equation model, an entropy value method and the like).
The air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.
According to the method, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, input variables with small influence are provided for the prediction model, influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A method of correcting errors in an air quality prediction system, comprising:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
2. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the random forest performs importance measurement on the influence factors:
given that there are influencing factors (variables),/>,/>,/>It is necessary to calculate +.>Individual influencing factors->Score statistic, variable->Score statistics>Representing, statistics->Indicate->The average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by node->Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
3. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the isolated forest algorithm idea is as follows: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assigning a dimension, and randomly generating a cut point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
4. A method of correcting errors in an air quality prediction system according to claim 1, wherein: in the step S3, a weighted prediction control law is used to replace a one-step prediction control law, namely, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.
5. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the prediction model in the S4 comprises a prediction model of a numerical method and a model in the aspect of machine learning, the weighted average method in the S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to the models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in the S4.
6. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.
CN202410152249.8A 2024-02-03 2024-02-03 Error correction method for air quality prediction system Pending CN117688501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410152249.8A CN117688501A (en) 2024-02-03 2024-02-03 Error correction method for air quality prediction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410152249.8A CN117688501A (en) 2024-02-03 2024-02-03 Error correction method for air quality prediction system

Publications (1)

Publication Number Publication Date
CN117688501A true CN117688501A (en) 2024-03-12

Family

ID=90139455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410152249.8A Pending CN117688501A (en) 2024-02-03 2024-02-03 Error correction method for air quality prediction system

Country Status (1)

Country Link
CN (1) CN117688501A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105954814A (en) * 2016-04-21 2016-09-21 象辑知源(武汉)科技有限公司 Portable meteorological monitoring system
CN106709588A (en) * 2015-11-13 2017-05-24 日本电气株式会社 Prediction model construction method and equipment and real-time prediction method and equipment
CN111309782A (en) * 2020-02-10 2020-06-19 西安交通大学 Subspace-based outlier detection algorithm
CN113326654A (en) * 2021-05-20 2021-08-31 北京市燃气集团有限责任公司 Method and device for constructing gas load prediction model
CN116911574A (en) * 2023-09-12 2023-10-20 华侨大学 Three-level supply chain optimization method and device based on whale algorithm and random forest

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709588A (en) * 2015-11-13 2017-05-24 日本电气株式会社 Prediction model construction method and equipment and real-time prediction method and equipment
CN105954814A (en) * 2016-04-21 2016-09-21 象辑知源(武汉)科技有限公司 Portable meteorological monitoring system
CN111309782A (en) * 2020-02-10 2020-06-19 西安交通大学 Subspace-based outlier detection algorithm
CN113326654A (en) * 2021-05-20 2021-08-31 北京市燃气集团有限责任公司 Method and device for constructing gas load prediction model
CN116911574A (en) * 2023-09-12 2023-10-20 华侨大学 Three-level supply chain optimization method and device based on whale algorithm and random forest

Similar Documents

Publication Publication Date Title
CN106951984B (en) Dynamic analysis and prediction method and device for system health degree
CN109902283B (en) Information output method and device
CN106886481B (en) Static analysis and prediction method and device for system health degree
CN113570138B (en) Method and device for predicting residual service life of equipment of time convolution network
CN105677791A (en) Method and system used for analyzing operating data of wind generating set
CN116739829B (en) Big data-based power data analysis method, system and medium
CN116227745B (en) Big data-based research and analysis method and system for fishing vessels
CN116187621A (en) Carbon emission monitoring method and device
CN111045902A (en) Pressure testing method and device for server
CN115719283A (en) Intelligent accounting management system
CN116862081B (en) Operation and maintenance method and system for pollution treatment equipment
CN111669368A (en) End-to-end network sensing abnormity detection and analysis method, system, device and medium
CN114553671A (en) Diagnosis method for power communication network fault alarm
CN111127242A (en) Power system reliability dynamic real-time assessment method based on small sample data
CN117688501A (en) Error correction method for air quality prediction system
CN111275136A (en) Fault prediction system based on small sample and early warning method thereof
CN115729761B (en) Hard disk fault prediction method, system, equipment and medium
CN117221910A (en) Data processing method, device, equipment and medium for wireless network optimization
CN115208773B (en) Network hidden fault monitoring method and device
CN116626574B (en) Reliability test method, system and storage medium of signal tester
CN114286370B (en) Method and device for determining influence of base station alarm on user perception service
CN117131425B (en) Numerical control machine tool processing state monitoring method and system based on feedback data
CN117592870B (en) Comprehensive analysis system based on water environment monitoring information
CN116132300B (en) Link identification method based on gradient lifting decision tree feature combination
CN116170841A (en) Wireless network processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination