CN115471011A - Air quality prediction method based on rough set and structure risk minimization - Google Patents
Air quality prediction method based on rough set and structure risk minimization Download PDFInfo
- Publication number
- CN115471011A CN115471011A CN202211277183.2A CN202211277183A CN115471011A CN 115471011 A CN115471011 A CN 115471011A CN 202211277183 A CN202211277183 A CN 202211277183A CN 115471011 A CN115471011 A CN 115471011A
- Authority
- CN
- China
- Prior art keywords
- air quality
- subset
- attribute
- condition
- decision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000007613 environmental effect Effects 0.000 claims abstract description 44
- 230000002068 genetic effect Effects 0.000 claims abstract description 14
- 238000012544 monitoring process Methods 0.000 claims abstract description 12
- 238000011156 evaluation Methods 0.000 claims abstract description 9
- 238000013441 quality evaluation Methods 0.000 claims abstract description 7
- 210000000349 chromosome Anatomy 0.000 claims description 17
- 230000035772 mutation Effects 0.000 claims description 11
- 238000005192 partition Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 239000003344 environmental pollutant Substances 0.000 description 10
- 231100000719 pollutant Toxicity 0.000 description 10
- 239000013618 particulate matter Substances 0.000 description 5
- 239000000428 dust Substances 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 239000000443 aerosol Substances 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physiology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an air quality prediction method based on rough set and structure risk minimization, which comprises the following steps: acquiring environmental parameter sample data related to air quality; establishing an air quality evaluation system for carrying out grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to establish an air quality index decision table; calculating the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute by utilizing a rough set theory and a structure risk minimization theory according to an air quality index decision table; and calculating by utilizing a genetic algorithm to obtain an optimal condition attribute subset, and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point.
Description
Technical Field
The invention relates to the field of air quality prediction, in particular to an air quality prediction method based on rough set and structure risk minimization.
Background
Environmental pollution problems have now led to a strong social response, because serious Air pollution problems affect the health and life of people, the main pollutants constituting the Air Quality Index are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter), DF (dustfall), etc., while AQI (Air Quality Index) may be associated with one or more pollutant factors.
In real life, data of main pollutants are key to form the AQI, however, due to errors in data acquisition of the pollutants, the pollutant data may be incomplete or redundant, and the work difficulty of air quality analysis and prediction is increased.
Roughset theory is a mathematical tool proposed by z. Pawlak in 1982 to deal with incomplete and uncertain knowledge. The rough set can effectively analyze and process various incomplete information and discover implicit information rules from the incomplete information. At present, the prediction of the air quality is mostly carried out by utilizing a rough set, but the rough set theory needs to be established on an indistinguishable relation, namely an equivalent relation, and because the requirement of the equivalent relation is strict and the tolerance to wrong information is low, the generalization of the rough set is generally weak when a large amount of noise exists in a data set, so that the prediction accuracy is unstable.
Disclosure of Invention
The method aims to solve the problems that in the prior art, due to the fact that errors exist in data acquisition of pollutants, the pollutant data may be incomplete or redundant, and the working difficulty of air quality analysis and prediction is increased; the invention provides an air quality prediction method based on a rough set and structure risk minimization, which comprises the following steps of:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
s2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
s3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table;
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s6: and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point to obtain an air quality result.
The present invention has at least the following advantageous effects
The invention combines the rough set theory and the structural risk minimization criterion, utilizes the characteristic that the rough set theory can carry out quantitative analysis, thereby reasoning and explaining the relationship between data, adds the structural risk minimization criterion, balances the prediction error and the complexity, and improves the stability and the robustness of the air quality prediction. And by combining with a genetic algorithm, the characteristic dimension is reduced on the premise of not reducing the classification accuracy, and the air quality prediction speed is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the structural risk minimization criteria of the present invention;
FIG. 3 is a flow chart of the genetic algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, the present invention provides an air quality prediction method based on roughness set and structure risk minimization, comprising:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
the main pollutants forming the air quality index comprise eight items of PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter), DF (dust fall) and the like; therefore, the environment parameters mainly collected by the invention are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall); the environmental parameter sample data is the pollutant concentration corresponding to PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall).
S2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
the air quality evaluation system establishes an air quality index grade evaluation system according to six air quality index grades of superior, good, light pollution, moderate pollution, severe pollution and severe pollution according to the national standard GB 3095-2012. The air quality index is 0-50 as grade one; the air quality index is 51-100, and is grade two; the air quality index is 101-150, and is grade three; the air quality index is 151-200, and is grade four; the air quality index is 201-300, and is grade five; an air quality index of greater than 300, grade six, as shown in table 1:
TABLE 1 air quality evaluation Table
Dividing 3 intervals according to the level concentration limit values of various environmental parameters in the national standard GB 3095-2012, and respectively coding the intervals into low intervals, medium intervals and high intervals, wherein the low intervals represent that the environmental parameters do not exceed the standard, the medium intervals represent that the environmental parameters exceed the standard, and the high intervals represent that the environmental parameters seriously exceed the standard; let A 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 The environmental parameter evaluation tables are respectively obtained by indicating that the environmental parameters are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall).
And evaluating the environmental parameter sample data in the environmental parameter sample data according to an environmental parameter evaluation table, and obtaining the air quality index grade corresponding to the environmental parameter sample data according to an air quality index evaluation table.
S3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table, wherein the air quality index decision table is shown as a table 2:
TABLE 2 air quality index decision-making table
Universe of discourse | A 1 | A 2 | A 3 | A 4 | A 5 | A 6 | A 7 | A 8 | D |
x 1 | Is low in | Is low with | Is low with | In | Is low with | Is low with | In | In | Class 2 |
x 2 | Is low with | Is low in | Height of | In (1) | Is low in | Is low in | Height of | In | Class 3 |
x 3 | Is low with | Is low in | Is low with | In (1) | Is low in | Is low in | Is low in | In | Class 1 |
x 4 | Is low with | In | Is low with | In | Is low with | In (1) | Is low with | In (1) | Class 2 |
x 5 | In | In | In | Height of | In | In | In (1) | Height of | Class 2 |
x 6 | Height of | In (1) | High (a) | High (a) | Height of | Height of | In | Height of | Class 4 |
x 7 | Is low in | In | In | Height of | Is low in | In | In | High (a) | Class 3 |
x 8 | High (a) | In | High (a) | High (a) | High (a) | In | Height of | Height of | Grade 5 |
… | … | … | … | … | … | … | … | … | … |
Wherein, x1 represents the first environmental parameter sample data, x2 represents the second environmental parameter sample data \8230, and so on, D represents the air quality index grade corresponding to the environmental parameter sample data.
Theory of rough set:
the rough set is a new mathematical method for processing inaccurate, uncertain and incomplete data, and can discover implicit knowledge and reveal potential laws through analysis and reasoning on the data. It passes through a pair of exact sets: upper approximation and lower approximation to determine an approximate description of the uncertain target set.
Let S be the information table, S is expressed as: s = (U, at = C & { d }, { V =) a |a∈At},{I a |a∈At})
U is a limited object set called domain; at is a finite, non-empty attribute set; va represents the attribute value range of the attribute a belonging to At, namely the value range of the attribute a; ia is an information function representing the value of the object x at a.
it is apparent that a non-resolvable relationship is an equivalent relationship that divides the domain of discourse U into U/R B ,U/R B ={X 1 ,X 2 ,…,X m By an equivalence relation R B The set of equivalence classes formed. Equivalence class [ x ] formed from equivalence relations] B ={y|(x,y)∈R B Is a basic knowledge grain in a coarse set.
For each subset X ∈ U and the equivalence relation R, the upper and lower approximations of X are defined as follows:
the universe of discourse U is divided into positive universe POS by the upper and lower approximate set of X R (X), negative field NEG R (X) and boundary Domain BNG R (X) three disjoint regions wherein
Positive domain: POS (Point-of-sale) R (X)=RX;
the approximate quality is used to describe the dependency between the attributes. If the value of the attribute Q is completely dependent on P, then Q is said to be dependent on P, denoted asLet P, Q ∈ At, Q have a dependency degree on P of k (0 ≦ k ≦ 1), expressed as:
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s41: a decision information system for obtaining an air quality index decision table according to a rough set theory:
wherein, let S = (U, C utoud, V, f) be a decision information system, where U = { x = { n = } 1 ,x 2 ,…,x n Is a non-empty finite set of objects, also called a domain, x i Denoted as the ith environmental parameter sample data. C = { a = 1 ,A 2 ,…,A m Is a non-empty finite set of attributes, where A i Expressed as concentrations of PM2.5, PM10, S02, NO2, CO, O3, TSP, etc. gases and aerosols, B is a subset of the set of conditional attributes C. D is a decision attribute, represented here as an air quality index rating, divided into six total ratings according to the severity of the air quality.Is a value range in which V a The value range of the attribute a is represented, and f is an information function.
S42: calculating the empirical error of the condition attribute subset according to the dependence of the decision attribute subset D on the condition attribute subset B in the decision information system;
R emp (B)=1-γ B (D),
where, |, represents the cardinality of the set, i.e., the number of elements within the set. One derived from the conditional attribute subset B is divided into U/IND (B) = { X 1 ,X 2 ,…,X n },X i Is an equivalent class thereof. [ x ] of] D One derived from decision attribute D is divided into U/IND (D).
For example, let U = { x) according to Table 2 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 Is an environment sample data set, where C = { a = } 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 Denotes the PM2.5, PM10, S02, NO2, CO, O3, TSP, DF ring parameter set; d represents a decision attribute (air quality index rating corresponding to the environmental parameter sample data). When the conditional attributes A1A2A3A4 are taken as a partition, the subset of conditional attributes,
b = { PM2.5, PM10, S02, NO2}, when the set of equivalence classes derived from conditional attribute B is: U/IND (B) = { { x { { X { } 1 ,x 3 },{x 2 },{x 4 },{x 5 },{x 7 },{x 6 ,x 8 } when the decision attribute D is taken as a partition, U/IND (D) = { { x 3 },{x 1 ,x 4 ,x 5 },{x 2 ,x 7 },{x 6 },{x 8 The dependency of decision attribute D on condition attribute B at this time is:
thus R emp (B)=1-γ B (D)=1-0.625=0.375。
S43: introducing a mutual information regularization function according to a structure risk minimization criterion to calculate mutual information of the condition attribute subset and the decision attribute;
referring to fig. 2, structural Risk Minimization (SRM) is a proposed strategy to prevent overfitting. Structural risk minimization is equivalent to regularization. The structural risk adds a regularizer or penalty term (penalty term) to the empirical risk that represents the complexity of the model. The structural risk is defined as:
wherein J (f) is a function of the model complexity, and λ ≧ 0 is a coefficient for balancing empirical risk and model complexity. The strategy of minimizing the structural risk considers the model with the smallest structural risk as the optimal model:
the minimum structural risk requires both the empirical risk and the model complexity to be small, and the model has better generalization at the moment.
On the basis, a mutual information regularization item I (B; D) is introduced into the selected rough set model and is expressed as
I(B;D)=H(D)-H(D|B)
Where H (D) is the entropy of the decision attribute D and H (D | B) is the conditional attribute subset B with respect to the decision
The conditional information entropy of the attribute D, and the I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, the quantitative analysis can be performed by utilizing the rough set theory, so that the characteristics of the relation between data are inferred and explained, the structural risk minimization criterion is added, the prediction error and the complexity are balanced, and the stability and the robustness of the air quality prediction are improved.
Referring to fig. 3, S5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s51: coding the condition attribute subset, and taking the coded condition attribute subset as an initial chromosome population of the genetic algorithm;
s52: calculating an expected error of the condition attribute subset according to the empirical error of the condition attribute subset and mutual information of the condition attribute subset and the decision attribute;
the expected error for the subset of conditional attributes comprises:
minR reg (B)=R emp (B)+αI(B;D);
wherein, minR reg (B) Representing the expected error, R, of the attribute subset B emp (B) The empirical error of the subset B is represented, I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, alpha is taken as a hyperparameter, and alpha is more than or equal to 0.
S53: and processing the initial chromosome by utilizing selection, crossing and mutation operators of a genetic algorithm to obtain a crossed mutated chromosome. Wherein, the selection operator adopts a roulette method, the crossover operator adopts single-point crossover, and the mutation operator adopts basic bit mutation;
the selection is made by roulette rules based on the expected error for each initial chromosome, as follows:
(1) Assuming a population size of M (number of initial chromosomes), calculating a fitness f (i =1, 2.. Multidot.m) (expectation error) of each individual (initial chromosome) in the population;
(2) Calculating the probability that each individual (initial chromosome) is inherited into the next generation population;
(3) Calculate each individual (initial chromosome) x i (i =1,2, \ 8230n); n) of the cumulative probability q i ;
(4) Generating a uniformly distributed pseudo random number r in the interval of [0,1 ];
(5) If r < q 1 Then individual 1 is selected, otherwise, individual k is selected such that: q. q.s k-1 <r≤q k If true;
(6) Repeating the steps (4) and (5) for M times to obtain the preset times.
When the crossover operation is carried out, individuals are selected to participate in crossover according to a certain probability, crossover points are randomly selected from two crossed random chromosomes, and then substrings after the crossover points are exchanged to generate next generation individuals.
The basic bit variation operation refers to the variation of the individual code string with the probability P m The gene value at one or several randomly assigned loci is subjected to mutation operation. The operation process is as follows:
(1) For each locus of an individual (initial chromosome), with a probability P m Designating it as a variation point;
(2) And carrying out mutation operation on the specified mutation points.
S54: taking the chromosomes after cross mutation as initial chromosomes of the next iteration of the genetic algorithm, and repeatedly executing the steps S52-S54 until the preset iteration times are reached; and taking the condition attribute subset with the minimum expected error as the optimal condition attribute subset. The method combines the genetic algorithm to reduce the characteristic dimension on the premise of not reducing the classification accuracy, and improves the air quality prediction speed.
S6: and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point to obtain an air quality result.
For example, let U = { x) according to Table 2 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 Is a set of environmental sample data, where C = { a = } 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 Denotes the PM2.5, PM10, S02, NO2, CO, O3, TSP, DF ring parameter set; d represents a decision attribute (air quality index rating corresponding to the environmental parameter sample data). Assuming that the optimal attribute subset obtained according to the above operation is A1A2A3, then the optimal condition attribute subset B = { PM2.5, PM10, S02}, then the set of equivalent classes derived from the condition attribute B is: U/IND (B) = { { x { { X { } 1 ,x 3 },{x 2 },{x 4 },{x 5 },{x 7 },{x 6 ,x 8 A set of rules derived from the decision table is γ (B) = { γ (y) = 1 ,γ 2 ,γ 3 ,γ 4 ,γ 5 ,γ 6 }. Given a sample data of the environment to be measured as X,
x = { low, high, medium }, where X is on the order of 3 with probability of 1, possibly the air quality index according to the rule set derived from the decision table.
And the emission of pollutants is reduced by timely feeding back the predicted air quality result to related enterprises, and the environmental quality of a target monitoring point is improved.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (8)
1. An air quality prediction method based on roughness set and structure risk minimization, comprising:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
s2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
s3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table;
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s6: and taking the condition attribute in the optimal condition attribute subset as the condition attribute of the rough set classifier, and predicting the air quality by using the environmental parameter data of the target monitoring point to obtain an air quality result.
2. The method of claim 1, wherein the calculating the empirical error of the subset of condition attributes and the mutual information between the subset of condition attributes and the decision attributes using the rough set theory and the structure risk minimization theory comprises:
s41: a decision information system of an air quality index decision table is obtained according to a rough set theory;
s42: calculating the empirical error of the condition attribute subset according to the dependence of the decision attribute subset D on the condition attribute subset B in the decision information system;
s43: and introducing a mutual information regularization function according to a structure risk minimization criterion to calculate mutual information of the condition attribute subset and the decision attribute.
3. The method of claim 2, wherein the empirical error for the subset of condition attributes comprises:
R emp (B)=1-γ B (D)
wherein, | · | represents the cardinality of the set, i.e., the number of elements in the set; U/IND (B) = { X 1 ,X 2 ,…,X n Denotes a partition derived from the condition attribute subset B, X i Represents an equivalence class, [ x ] in the partition] D One representation derived from the decision attribute D is divided into U/IND (D).
4. The method of claim 2, wherein the mutual information between the subset of condition attributes and the decision attributes comprises:
I(B;D)=H(D)-H(D|B)
where H (D) is the entropy of the decision attribute D, H (D | B) is the entropy of the conditional information of the conditional attribute subset B with respect to the decision attribute D, and I (B; D) represents the mutual information of the attribute subset B and the decision attribute D.
5. The method of claim 1, wherein the calculating an optimal condition attribute subset using a genetic algorithm based on empirical errors of the condition attribute subset and mutual information between the condition attribute subset and the decision attribute comprises:
s51: coding the condition attribute subset, and taking the coded condition attribute subset as an initial chromosome population of the genetic algorithm;
s52: calculating an expected error of the condition attribute subset according to the empirical error of the condition attribute subset and mutual information of the condition attribute subset and the decision attribute;
s53: processing the initial chromosome by utilizing selection, crossing and mutation operators of a genetic algorithm to obtain a crossed variant chromosome, wherein the selection operator adopts a roulette method, the crossing operator adopts single-point crossing, and the mutation operator adopts basic bit mutation;
s54: taking the chromosome after cross mutation as an initial chromosome of the next iteration of the genetic algorithm, and repeatedly executing the steps S52-S54 until the preset iteration times are reached; and taking the condition attribute subset with the minimum expected error as the optimal condition attribute subset.
6. The method of claim 1, wherein the expected error for the subset of conditional attributes comprises:
min R reg (B)=R emp (B)+αI(B;D);
wherein, min R reg (B) Representing the expected error, R, of the attribute subset B emp (B) The empirical error of the subset B is represented, I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, alpha is used as a hyperparameter, and alpha is more than or equal to 0.
7. The method of claim 1, wherein the environmental parameters associated with air quality comprise: including PM2.5, PM10, S02, NO2, CO, O3, TSP, DF.
8. The air quality prediction method based on rough set and structural risk minimization according to claim 1, characterized in that the air quality evaluation system establishes an air quality index grade evaluation system according to six air quality index grades of excellent, good, light pollution, moderate pollution, severe pollution and severe pollution according to national standard GB 3095-2012.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211277183.2A CN115471011A (en) | 2022-10-19 | 2022-10-19 | Air quality prediction method based on rough set and structure risk minimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211277183.2A CN115471011A (en) | 2022-10-19 | 2022-10-19 | Air quality prediction method based on rough set and structure risk minimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115471011A true CN115471011A (en) | 2022-12-13 |
Family
ID=84337605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211277183.2A Pending CN115471011A (en) | 2022-10-19 | 2022-10-19 | Air quality prediction method based on rough set and structure risk minimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115471011A (en) |
-
2022
- 2022-10-19 CN CN202211277183.2A patent/CN115471011A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2018101946A4 (en) | Geographical multivariate flow data spatio-temporal autocorrelation analysis method based on cellular automaton | |
CN112070125A (en) | Prediction method of unbalanced data set based on isolated forest learning | |
CN111178611B (en) | Method for predicting daily electric quantity | |
Xiang et al. | A clustering-based surrogate-assisted multiobjective evolutionary algorithm for shelter location problem under uncertainty of road networks | |
CN116108758A (en) | Landslide susceptibility evaluation method | |
CN111639878A (en) | Landslide risk prediction method and system based on knowledge graph construction | |
CN108564110B (en) | Air quality prediction method based on clustering algorithm | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
CN114822709A (en) | Method and device for analyzing multi-granularity accurate cause of atmospheric pollution | |
Hobson et al. | Wi-Fi Based Occupancy Clustering and Motif Identification: A Case Study. | |
CN112801344A (en) | Coastal zone ecosystem health prediction method based on DPSIR model, electronic equipment and computer readable medium | |
CN115471011A (en) | Air quality prediction method based on rough set and structure risk minimization | |
CN114091961A (en) | Power enterprise supplier evaluation method based on semi-supervised SVM | |
CN112733903A (en) | Air quality monitoring and alarming method, system, device and medium based on SVM-RF-DT combination | |
CN113657441A (en) | Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening | |
Li et al. | Identifying urban form typologies in seoul with mixture model based clustering | |
CN113240225B (en) | Power transmission and transformation project cost risk grading method based on fuzzy worst index | |
CN112766537A (en) | Short-term electric load prediction method | |
Zhang et al. | MOPNAR-II: an improved multi-objective evolutionary algorithm for mining positive and negative association rules | |
CN117709908B (en) | Intelligent auditing method and system for distribution rationality of power grid engineering personnel, materials and machines | |
CN117408742B (en) | User screening method and system | |
CN110543983A (en) | cost sensitive active learning method for gas well section type prediction | |
CN114610966A (en) | Method for calculating and comparing economic efficiency and energy consumption of enterprise | |
CN117575386A (en) | Sponge city toughness evaluation method | |
CN117455551A (en) | Industry electricity consumption prediction method based on industry relation complex network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |