CN117688501A - Error correction method for air quality prediction system - Google Patents
Error correction method for air quality prediction system Download PDFInfo
- Publication number
- CN117688501A CN117688501A CN202410152249.8A CN202410152249A CN117688501A CN 117688501 A CN117688501 A CN 117688501A CN 202410152249 A CN202410152249 A CN 202410152249A CN 117688501 A CN117688501 A CN 117688501A
- Authority
- CN
- China
- Prior art keywords
- prediction
- data
- air quality
- node
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 title description 2
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000007637 random forest analysis Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000004927 fusion Effects 0.000 claims abstract description 6
- 230000002159 abnormal effect Effects 0.000 claims abstract description 4
- 238000013450 outlier detection Methods 0.000 claims abstract description 4
- 230000008859 change Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000013277 forecasting method Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003344 environmental pollutant Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 231100000719 pollutant Toxicity 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Abstract
The invention relates to the technical field of air quality prediction, and particularly discloses a method for correcting errors of an air quality prediction system, which comprises the following steps: s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result; s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved; s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness; according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.
Description
Technical Field
The invention belongs to the technical field of air quality prediction, and particularly relates to an error correcting method of an air quality prediction system.
Background
The air quality prediction service can enable an environmental management department to know the future change trend of air pollution more accurately so as to take targeted policy measures and ensure the health and safety of the masses. The development of the environmental air quality forecasting work is an important technical means for ensuring timely and proper coping with heavy pollution weather, and has guiding significance for combined emission reduction of regional atmosphere pollution. The existing air quality forecasting method mainly comprises a numerical analysis method and a statistical analysis method. However, numerical forecasting methods generally require accurate input data and expensive computational resources to make air quality predictions, while statistical forecasting methods have less accuracy for non-linearly varying pollutant concentration predictions. In the case where an immediate accurate prediction is required, using the existing air quality prediction model is very challenging.
The existing air quality forecasting system has the defects that the model is greatly influenced by discrete data during forecasting, and the forecasting result is easy to send errors, so that the forecasting accuracy of the system is low, and the use experience of people is adversely affected.
Disclosure of Invention
The present invention is directed to a method for correcting errors in an air quality prediction system, so as to solve the problems set forth in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a method of correcting errors in an air quality prediction system, comprising:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
Preferably, the random forest performs importance measurement on the influence factors:
given that there are influencing factors (variables),,,Requiring calculation ofOf individual influencing factorsScore statistics, variablesFor score statisticsRepresenting statisticsRepresent the firstThe average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by node->Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
Preferably, the isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assign oneA dimension for randomly generating a cutting point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
Preferably, in the step S3, a weighted prediction control law is used instead of a one-step prediction control law, that is, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment in the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.
Preferably, the prediction model in S4 includes a numerical prediction model and a machine learning model, the weighted average method in S4 performs weighted average on output results of multiple models of the same type, and assigns different weights to models with different effects, and the result in S4 may be calculated in different manners according to different choices of the prediction model (such as a factor score method, a structural equation model, an entropy value method, etc.).
Preferably, the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile, the data safety protection module is connected with the firewall.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, so that less-influenced input variables are provided for the prediction model, the influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
fig. 2 is a schematic diagram of a random forest algorithm of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
referring to fig. 1-2, a method for correcting errors in an air quality prediction system includes:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
The random forest carries out importance measurement on the influence factors:
given that there are influencing factors (variables),,,Requiring calculation ofOf individual influencing factorsScore statistics, variablesFor score statisticsRepresenting statisticsRepresent the firstThe average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by nodes/>Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
An isolated forest algorithm idea: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assigning a dimension, and randomly generating a cut point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
And S3, replacing the one-step predictive control law by a weighted predictive control law, namely, obtaining a control signal actually applied to the system by weighted average summation of the one-step control quantity at the current moment and the predictive value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of error control signals is reduced.
The prediction model in S4 comprises a prediction model of a numerical method and a model in machine learning, the weighted average method in S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in S4 (such as a factor score method, a structural equation model, an entropy value method and the like).
The air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.
According to the method, the collected air quality data is optimally analyzed by utilizing the isolated forest and the random forest, input variables with small influence are provided for the prediction model, influence of discrete factors and redundancy factors on the prediction result is reduced, the data processing mode of the prediction model is changed, the accuracy of the model prediction result is improved through multi-model fusion, and better use experience is provided for a user.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. A method of correcting errors in an air quality prediction system, comprising:
s1: performing outlier detection on the air quality data by using an isolated forest algorithm, and reducing the influence of abnormal data on a prediction result;
s2: selecting a random forest algorithm to screen out an optimal factor subset as an input variable of a prediction model, so that the prediction accuracy is improved;
s3: the weighted prediction control law is used for replacing the one-step prediction control law for the prediction control system, so that the system has stronger robustness;
s4: the model precision can be improved by adopting a multi-model fusion mode and carrying out a weighted average method on the prediction results of different models.
2. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the random forest performs importance measurement on the influence factors:
given that there are influencing factors (variables),/>,/>,/>It is necessary to calculate +.>Individual influencing factors->Score statistic, variable->Score statistics>Representing, statistics->Indicate->The average change amount of node splitting non-purity of each variable in all the RF trees is calculated as the Gini index:
in (1) the->For the number of categories of the self-help sample set, +.>For node->The sample belongs to->Probability estimates for the class; when the sample is classified data +.>Node->The Gini index of (c) is:
in (1) the->For the sample at node->Probability estimation values belonging to any class;
variable(s)At node->Importance of (a), node->The Gini index change before and after branching is:
in (1) the->And->Respectively represent by node->Gini index of two new nodes split;
if the variable isIn->The presence of ∈10 in the tree>Secondary, then variable->In->The importance of the tree is:
variable->Gini importance in RF is defined as:
in (1) the->Is the number of classification trees in RF.
3. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the isolated forest algorithm idea is as follows: the large probability of being partitioned into leaf nodes soon is outlier data, and the algorithm steps are as follows:
1) Random selection from training dataThe sample points are used as subsamples and put into the root nodes of the tree;
2) Randomly assigning a dimension, and randomly generating a cut point in the current node data(the cut point is generated between the maximum and minimum values of the specified dimension in the current node data);
3) A hyperplane is generated with this cut point, and then the current node data space is divided into 2 subspaces: to be smaller in the appointed dimension thanIs placed in the left child of the current node, will be greater than or equal to +.>Is placed in the right child of the current node;
4) Recursively steps 2) and 3) in the child nodes, new child nodes are constructed continuously until there is only one data in the child node (no longer cutting) or the child node has reached a limit height.
4. A method of correcting errors in an air quality prediction system according to claim 1, wherein: in the step S3, a weighted prediction control law is used to replace a one-step prediction control law, namely, a control signal actually applied to the system is obtained by weighted average summation of a one-step control quantity at the current moment and a predicted value of the control quantity at the current moment at the past moment, so that the control system has fault-tolerant control capability, the oscillation and saturation of the control signal are reduced, and the generation of an error control signal is reduced.
5. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the prediction model in the S4 comprises a prediction model of a numerical method and a model in the aspect of machine learning, the weighted average method in the S4 carries out weighted average on output results of a plurality of models of the same type, different weights are distributed to the models with different effects, and the results can be calculated in different modes according to different selection of the prediction model in the S4.
6. A method of correcting errors in an air quality prediction system according to claim 1, wherein: the air quality forecasting system comprises a data acquisition module, a data processing module, a data application module and a database, wherein the data acquisition module comprises data acquisition equipment, a monitoring instrument and a transmission end, the data processing module comprises a service end, a control end, a router and a switch, the data application module comprises a user mobile phone end, a user computer end and a background management end, the air quality forecasting system further comprises a data safety protection module and a firewall, the data safety protection module is connected with each module, and meanwhile the data safety protection module is connected with the firewall.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410152249.8A CN117688501A (en) | 2024-02-03 | 2024-02-03 | Error correction method for air quality prediction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410152249.8A CN117688501A (en) | 2024-02-03 | 2024-02-03 | Error correction method for air quality prediction system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117688501A true CN117688501A (en) | 2024-03-12 |
Family
ID=90139455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410152249.8A Pending CN117688501A (en) | 2024-02-03 | 2024-02-03 | Error correction method for air quality prediction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117688501A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105954814A (en) * | 2016-04-21 | 2016-09-21 | 象辑知源(武汉)科技有限公司 | Portable meteorological monitoring system |
CN106709588A (en) * | 2015-11-13 | 2017-05-24 | 日本电气株式会社 | Prediction model construction method and equipment and real-time prediction method and equipment |
CN111309782A (en) * | 2020-02-10 | 2020-06-19 | 西安交通大学 | Subspace-based outlier detection algorithm |
CN113326654A (en) * | 2021-05-20 | 2021-08-31 | 北京市燃气集团有限责任公司 | Method and device for constructing gas load prediction model |
CN116911574A (en) * | 2023-09-12 | 2023-10-20 | 华侨大学 | Three-level supply chain optimization method and device based on whale algorithm and random forest |
-
2024
- 2024-02-03 CN CN202410152249.8A patent/CN117688501A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709588A (en) * | 2015-11-13 | 2017-05-24 | 日本电气株式会社 | Prediction model construction method and equipment and real-time prediction method and equipment |
CN105954814A (en) * | 2016-04-21 | 2016-09-21 | 象辑知源(武汉)科技有限公司 | Portable meteorological monitoring system |
CN111309782A (en) * | 2020-02-10 | 2020-06-19 | 西安交通大学 | Subspace-based outlier detection algorithm |
CN113326654A (en) * | 2021-05-20 | 2021-08-31 | 北京市燃气集团有限责任公司 | Method and device for constructing gas load prediction model |
CN116911574A (en) * | 2023-09-12 | 2023-10-20 | 华侨大学 | Three-level supply chain optimization method and device based on whale algorithm and random forest |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951984B (en) | Dynamic analysis and prediction method and device for system health degree | |
CN109902283B (en) | Information output method and device | |
CN106886481B (en) | Static analysis and prediction method and device for system health degree | |
CN113570138B (en) | Method and device for predicting residual service life of equipment of time convolution network | |
CN105677791A (en) | Method and system used for analyzing operating data of wind generating set | |
CN116739829B (en) | Big data-based power data analysis method, system and medium | |
CN116227745B (en) | Big data-based research and analysis method and system for fishing vessels | |
CN116187621A (en) | Carbon emission monitoring method and device | |
CN111045902A (en) | Pressure testing method and device for server | |
CN115719283A (en) | Intelligent accounting management system | |
CN116862081B (en) | Operation and maintenance method and system for pollution treatment equipment | |
CN111669368A (en) | End-to-end network sensing abnormity detection and analysis method, system, device and medium | |
CN114553671A (en) | Diagnosis method for power communication network fault alarm | |
CN111127242A (en) | Power system reliability dynamic real-time assessment method based on small sample data | |
CN117688501A (en) | Error correction method for air quality prediction system | |
CN111275136A (en) | Fault prediction system based on small sample and early warning method thereof | |
CN115729761B (en) | Hard disk fault prediction method, system, equipment and medium | |
CN117221910A (en) | Data processing method, device, equipment and medium for wireless network optimization | |
CN115208773B (en) | Network hidden fault monitoring method and device | |
CN116626574B (en) | Reliability test method, system and storage medium of signal tester | |
CN114286370B (en) | Method and device for determining influence of base station alarm on user perception service | |
CN117131425B (en) | Numerical control machine tool processing state monitoring method and system based on feedback data | |
CN117592870B (en) | Comprehensive analysis system based on water environment monitoring information | |
CN116132300B (en) | Link identification method based on gradient lifting decision tree feature combination | |
CN116170841A (en) | Wireless network processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |