CN115471011A - Air quality prediction method based on rough set and structure risk minimization - Google Patents

Air quality prediction method based on rough set and structure risk minimization Download PDF

Info

Publication number
CN115471011A
CN115471011A CN202211277183.2A CN202211277183A CN115471011A CN 115471011 A CN115471011 A CN 115471011A CN 202211277183 A CN202211277183 A CN 202211277183A CN 115471011 A CN115471011 A CN 115471011A
Authority
CN
China
Prior art keywords
air quality
subset
attribute
condition
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211277183.2A
Other languages
Chinese (zh)
Inventor
张晓霞
张蓬浩
王国胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211277183.2A priority Critical patent/CN115471011A/en
Publication of CN115471011A publication Critical patent/CN115471011A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physiology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an air quality prediction method based on rough set and structure risk minimization, which comprises the following steps: acquiring environmental parameter sample data related to air quality; establishing an air quality evaluation system for carrying out grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to establish an air quality index decision table; calculating the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute by utilizing a rough set theory and a structure risk minimization theory according to an air quality index decision table; and calculating by utilizing a genetic algorithm to obtain an optimal condition attribute subset, and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point.

Description

Air quality prediction method based on rough set and structure risk minimization
Technical Field
The invention relates to the field of air quality prediction, in particular to an air quality prediction method based on rough set and structure risk minimization.
Background
Environmental pollution problems have now led to a strong social response, because serious Air pollution problems affect the health and life of people, the main pollutants constituting the Air Quality Index are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter), DF (dustfall), etc., while AQI (Air Quality Index) may be associated with one or more pollutant factors.
In real life, data of main pollutants are key to form the AQI, however, due to errors in data acquisition of the pollutants, the pollutant data may be incomplete or redundant, and the work difficulty of air quality analysis and prediction is increased.
Roughset theory is a mathematical tool proposed by z. Pawlak in 1982 to deal with incomplete and uncertain knowledge. The rough set can effectively analyze and process various incomplete information and discover implicit information rules from the incomplete information. At present, the prediction of the air quality is mostly carried out by utilizing a rough set, but the rough set theory needs to be established on an indistinguishable relation, namely an equivalent relation, and because the requirement of the equivalent relation is strict and the tolerance to wrong information is low, the generalization of the rough set is generally weak when a large amount of noise exists in a data set, so that the prediction accuracy is unstable.
Disclosure of Invention
The method aims to solve the problems that in the prior art, due to the fact that errors exist in data acquisition of pollutants, the pollutant data may be incomplete or redundant, and the working difficulty of air quality analysis and prediction is increased; the invention provides an air quality prediction method based on a rough set and structure risk minimization, which comprises the following steps of:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
s2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
s3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table;
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s6: and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point to obtain an air quality result.
The present invention has at least the following advantageous effects
The invention combines the rough set theory and the structural risk minimization criterion, utilizes the characteristic that the rough set theory can carry out quantitative analysis, thereby reasoning and explaining the relationship between data, adds the structural risk minimization criterion, balances the prediction error and the complexity, and improves the stability and the robustness of the air quality prediction. And by combining with a genetic algorithm, the characteristic dimension is reduced on the premise of not reducing the classification accuracy, and the air quality prediction speed is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the structural risk minimization criteria of the present invention;
FIG. 3 is a flow chart of the genetic algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, the present invention provides an air quality prediction method based on roughness set and structure risk minimization, comprising:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
the main pollutants forming the air quality index comprise eight items of PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter), DF (dust fall) and the like; therefore, the environment parameters mainly collected by the invention are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall); the environmental parameter sample data is the pollutant concentration corresponding to PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall).
S2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
the air quality evaluation system establishes an air quality index grade evaluation system according to six air quality index grades of superior, good, light pollution, moderate pollution, severe pollution and severe pollution according to the national standard GB 3095-2012. The air quality index is 0-50 as grade one; the air quality index is 51-100, and is grade two; the air quality index is 101-150, and is grade three; the air quality index is 151-200, and is grade four; the air quality index is 201-300, and is grade five; an air quality index of greater than 300, grade six, as shown in table 1:
TABLE 1 air quality evaluation Table
Figure BDA0003896870100000031
Dividing 3 intervals according to the level concentration limit values of various environmental parameters in the national standard GB 3095-2012, and respectively coding the intervals into low intervals, medium intervals and high intervals, wherein the low intervals represent that the environmental parameters do not exceed the standard, the medium intervals represent that the environmental parameters exceed the standard, and the high intervals represent that the environmental parameters seriously exceed the standard; let A 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 The environmental parameter evaluation tables are respectively obtained by indicating that the environmental parameters are PM2.5, PM10, S02, NO2, CO, O3, TSP (suspended particulate matter) and DF (dust fall).
And evaluating the environmental parameter sample data in the environmental parameter sample data according to an environmental parameter evaluation table, and obtaining the air quality index grade corresponding to the environmental parameter sample data according to an air quality index evaluation table.
S3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table, wherein the air quality index decision table is shown as a table 2:
TABLE 2 air quality index decision-making table
Universe of discourse A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 D
x 1 Is low in Is low with Is low with In Is low with Is low with In In Class 2
x 2 Is low with Is low in Height of In (1) Is low in Is low in Height of In Class 3
x 3 Is low with Is low in Is low with In (1) Is low in Is low in Is low in In Class 1
x 4 Is low with In Is low with In Is low with In (1) Is low with In (1) Class 2
x 5 In In In Height of In In In (1) Height of Class 2
x 6 Height of In (1) High (a) High (a) Height of Height of In Height of Class 4
x 7 Is low in In In Height of Is low in In In High (a) Class 3
x 8 High (a) In High (a) High (a) High (a) In Height of Height of Grade 5
Wherein, x1 represents the first environmental parameter sample data, x2 represents the second environmental parameter sample data \8230, and so on, D represents the air quality index grade corresponding to the environmental parameter sample data.
Theory of rough set:
the rough set is a new mathematical method for processing inaccurate, uncertain and incomplete data, and can discover implicit knowledge and reveal potential laws through analysis and reasoning on the data. It passes through a pair of exact sets: upper approximation and lower approximation to determine an approximate description of the uncertain target set.
Let S be the information table, S is expressed as: s = (U, at = C & { d }, { V =) a |a∈At},{I a |a∈At})
U is a limited object set called domain; at is a finite, non-empty attribute set; va represents the attribute value range of the attribute a belonging to At, namely the value range of the attribute a; ia is an information function representing the value of the object x at a.
The non-resolvable relationship on the domain of discourse U is:
Figure BDA0003896870100000051
it is apparent that a non-resolvable relationship is an equivalent relationship that divides the domain of discourse U into U/R B ,U/R B ={X 1 ,X 2 ,…,X m By an equivalence relation R B The set of equivalence classes formed. Equivalence class [ x ] formed from equivalence relations] B ={y|(x,y)∈R B Is a basic knowledge grain in a coarse set.
For each subset X ∈ U and the equivalence relation R, the upper and lower approximations of X are defined as follows:
Figure BDA0003896870100000052
Figure BDA0003896870100000053
the universe of discourse U is divided into positive universe POS by the upper and lower approximate set of X R (X), negative field NEG R (X) and boundary Domain BNG R (X) three disjoint regions wherein
Positive domain: POS (Point-of-sale) R (X)=RX;
A negative domain:
Figure BDA0003896870100000054
boundary domain:
Figure BDA0003896870100000055
the approximate quality is used to describe the dependency between the attributes. If the value of the attribute Q is completely dependent on P, then Q is said to be dependent on P, denoted as
Figure BDA0003896870100000056
Let P, Q ∈ At, Q have a dependency degree on P of k (0 ≦ k ≦ 1), expressed as:
Figure BDA0003896870100000057
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s41: a decision information system for obtaining an air quality index decision table according to a rough set theory:
wherein, let S = (U, C utoud, V, f) be a decision information system, where U = { x = { n = } 1 ,x 2 ,…,x n Is a non-empty finite set of objects, also called a domain, x i Denoted as the ith environmental parameter sample data. C = { a = 1 ,A 2 ,…,A m Is a non-empty finite set of attributes, where A i Expressed as concentrations of PM2.5, PM10, S02, NO2, CO, O3, TSP, etc. gases and aerosols, B is a subset of the set of conditional attributes C. D is a decision attribute, represented here as an air quality index rating, divided into six total ratings according to the severity of the air quality.
Figure BDA0003896870100000061
Is a value range in which V a The value range of the attribute a is represented, and f is an information function.
S42: calculating the empirical error of the condition attribute subset according to the dependence of the decision attribute subset D on the condition attribute subset B in the decision information system;
R emp (B)=1-γ B (D),
Figure BDA0003896870100000062
where, |, represents the cardinality of the set, i.e., the number of elements within the set. One derived from the conditional attribute subset B is divided into U/IND (B) = { X 1 ,X 2 ,…,X n },X i Is an equivalent class thereof. [ x ] of] D One derived from decision attribute D is divided into U/IND (D).
For example, let U = { x) according to Table 2 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 Is an environment sample data set, where C = { a = } 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 Denotes the PM2.5, PM10, S02, NO2, CO, O3, TSP, DF ring parameter set; d represents a decision attribute (air quality index rating corresponding to the environmental parameter sample data). When the conditional attributes A1A2A3A4 are taken as a partition, the subset of conditional attributes,
b = { PM2.5, PM10, S02, NO2}, when the set of equivalence classes derived from conditional attribute B is: U/IND (B) = { { x { { X { } 1 ,x 3 },{x 2 },{x 4 },{x 5 },{x 7 },{x 6 ,x 8 } when the decision attribute D is taken as a partition, U/IND (D) = { { x 3 },{x 1 ,x 4 ,x 5 },{x 2 ,x 7 },{x 6 },{x 8 The dependency of decision attribute D on condition attribute B at this time is:
Figure BDA0003896870100000071
thus R emp (B)=1-γ B (D)=1-0.625=0.375。
S43: introducing a mutual information regularization function according to a structure risk minimization criterion to calculate mutual information of the condition attribute subset and the decision attribute;
referring to fig. 2, structural Risk Minimization (SRM) is a proposed strategy to prevent overfitting. Structural risk minimization is equivalent to regularization. The structural risk adds a regularizer or penalty term (penalty term) to the empirical risk that represents the complexity of the model. The structural risk is defined as:
Figure BDA0003896870100000072
wherein J (f) is a function of the model complexity, and λ ≧ 0 is a coefficient for balancing empirical risk and model complexity. The strategy of minimizing the structural risk considers the model with the smallest structural risk as the optimal model:
Figure BDA0003896870100000073
the minimum structural risk requires both the empirical risk and the model complexity to be small, and the model has better generalization at the moment.
On the basis, a mutual information regularization item I (B; D) is introduced into the selected rough set model and is expressed as
I(B;D)=H(D)-H(D|B)
Where H (D) is the entropy of the decision attribute D and H (D | B) is the conditional attribute subset B with respect to the decision
The conditional information entropy of the attribute D, and the I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, the quantitative analysis can be performed by utilizing the rough set theory, so that the characteristics of the relation between data are inferred and explained, the structural risk minimization criterion is added, the prediction error and the complexity are balanced, and the stability and the robustness of the air quality prediction are improved.
Referring to fig. 3, S5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s51: coding the condition attribute subset, and taking the coded condition attribute subset as an initial chromosome population of the genetic algorithm;
s52: calculating an expected error of the condition attribute subset according to the empirical error of the condition attribute subset and mutual information of the condition attribute subset and the decision attribute;
the expected error for the subset of conditional attributes comprises:
minR reg (B)=R emp (B)+αI(B;D);
wherein, minR reg (B) Representing the expected error, R, of the attribute subset B emp (B) The empirical error of the subset B is represented, I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, alpha is taken as a hyperparameter, and alpha is more than or equal to 0.
S53: and processing the initial chromosome by utilizing selection, crossing and mutation operators of a genetic algorithm to obtain a crossed mutated chromosome. Wherein, the selection operator adopts a roulette method, the crossover operator adopts single-point crossover, and the mutation operator adopts basic bit mutation;
the selection is made by roulette rules based on the expected error for each initial chromosome, as follows:
(1) Assuming a population size of M (number of initial chromosomes), calculating a fitness f (i =1, 2.. Multidot.m) (expectation error) of each individual (initial chromosome) in the population;
(2) Calculating the probability that each individual (initial chromosome) is inherited into the next generation population;
Figure BDA0003896870100000081
(3) Calculate each individual (initial chromosome) x i (i =1,2, \ 8230n); n) of the cumulative probability q i
Figure BDA0003896870100000091
(4) Generating a uniformly distributed pseudo random number r in the interval of [0,1 ];
(5) If r < q 1 Then individual 1 is selected, otherwise, individual k is selected such that: q. q.s k-1 <r≤q k If true;
(6) Repeating the steps (4) and (5) for M times to obtain the preset times.
When the crossover operation is carried out, individuals are selected to participate in crossover according to a certain probability, crossover points are randomly selected from two crossed random chromosomes, and then substrings after the crossover points are exchanged to generate next generation individuals.
The basic bit variation operation refers to the variation of the individual code string with the probability P m The gene value at one or several randomly assigned loci is subjected to mutation operation. The operation process is as follows:
(1) For each locus of an individual (initial chromosome), with a probability P m Designating it as a variation point;
(2) And carrying out mutation operation on the specified mutation points.
S54: taking the chromosomes after cross mutation as initial chromosomes of the next iteration of the genetic algorithm, and repeatedly executing the steps S52-S54 until the preset iteration times are reached; and taking the condition attribute subset with the minimum expected error as the optimal condition attribute subset. The method combines the genetic algorithm to reduce the characteristic dimension on the premise of not reducing the classification accuracy, and improves the air quality prediction speed.
S6: and predicting the air quality by taking the condition attributes in the optimal condition attribute subset as the condition attributes of the rough set classifier and using the environmental parameter data of the target monitoring point to obtain an air quality result.
For example, let U = { x) according to Table 2 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 Is a set of environmental sample data, where C = { a = } 1 ,A 2 ,A 3 ,A 4 ,A 5 ,A 6 ,A 7 ,A 8 Denotes the PM2.5, PM10, S02, NO2, CO, O3, TSP, DF ring parameter set; d represents a decision attribute (air quality index rating corresponding to the environmental parameter sample data). Assuming that the optimal attribute subset obtained according to the above operation is A1A2A3, then the optimal condition attribute subset B = { PM2.5, PM10, S02}, then the set of equivalent classes derived from the condition attribute B is: U/IND (B) = { { x { { X { } 1 ,x 3 },{x 2 },{x 4 },{x 5 },{x 7 },{x 6 ,x 8 A set of rules derived from the decision table is γ (B) = { γ (y) = 1 ,γ 2 ,γ 3 ,γ 4 ,γ 5 ,γ 6 }. Given a sample data of the environment to be measured as X,
x = { low, high, medium }, where X is on the order of 3 with probability of 1, possibly the air quality index according to the rule set derived from the decision table.
And the emission of pollutants is reduced by timely feeding back the predicted air quality result to related enterprises, and the environmental quality of a target monitoring point is improved.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. An air quality prediction method based on roughness set and structure risk minimization, comprising:
s1: acquiring environmental parameter sample data related to air quality from a meteorological monitoring station;
s2: establishing an air quality evaluation system to perform grade evaluation on environmental parameter sample data acquired from a meteorological monitoring station to obtain the air quality index grade of the environmental parameter sample;
s3: according to the environmental parameter sample data, taking an environmental parameter related to air quality as a condition attribute, and taking the air quality index grade of the environmental parameter sample as a decision attribute to create an air quality index decision table;
s4: generating a limited number of condition attribute subsets according to condition attributes in an air quality index decision table, and calculating the empirical error of the condition attribute subsets and the mutual information of the condition attribute subsets and the decision attributes by using a rough set theory and a structural risk minimization theory;
s5: calculating to obtain an optimal condition attribute subset by using a genetic algorithm according to the empirical error of the condition attribute subset and the mutual information of the condition attribute subset and the decision attribute;
s6: and taking the condition attribute in the optimal condition attribute subset as the condition attribute of the rough set classifier, and predicting the air quality by using the environmental parameter data of the target monitoring point to obtain an air quality result.
2. The method of claim 1, wherein the calculating the empirical error of the subset of condition attributes and the mutual information between the subset of condition attributes and the decision attributes using the rough set theory and the structure risk minimization theory comprises:
s41: a decision information system of an air quality index decision table is obtained according to a rough set theory;
s42: calculating the empirical error of the condition attribute subset according to the dependence of the decision attribute subset D on the condition attribute subset B in the decision information system;
s43: and introducing a mutual information regularization function according to a structure risk minimization criterion to calculate mutual information of the condition attribute subset and the decision attribute.
3. The method of claim 2, wherein the empirical error for the subset of condition attributes comprises:
R emp (B)=1-γ B (D)
Figure FDA0003896870090000021
wherein, | · | represents the cardinality of the set, i.e., the number of elements in the set; U/IND (B) = { X 1 ,X 2 ,…,X n Denotes a partition derived from the condition attribute subset B, X i Represents an equivalence class, [ x ] in the partition] D One representation derived from the decision attribute D is divided into U/IND (D).
4. The method of claim 2, wherein the mutual information between the subset of condition attributes and the decision attributes comprises:
I(B;D)=H(D)-H(D|B)
where H (D) is the entropy of the decision attribute D, H (D | B) is the entropy of the conditional information of the conditional attribute subset B with respect to the decision attribute D, and I (B; D) represents the mutual information of the attribute subset B and the decision attribute D.
5. The method of claim 1, wherein the calculating an optimal condition attribute subset using a genetic algorithm based on empirical errors of the condition attribute subset and mutual information between the condition attribute subset and the decision attribute comprises:
s51: coding the condition attribute subset, and taking the coded condition attribute subset as an initial chromosome population of the genetic algorithm;
s52: calculating an expected error of the condition attribute subset according to the empirical error of the condition attribute subset and mutual information of the condition attribute subset and the decision attribute;
s53: processing the initial chromosome by utilizing selection, crossing and mutation operators of a genetic algorithm to obtain a crossed variant chromosome, wherein the selection operator adopts a roulette method, the crossing operator adopts single-point crossing, and the mutation operator adopts basic bit mutation;
s54: taking the chromosome after cross mutation as an initial chromosome of the next iteration of the genetic algorithm, and repeatedly executing the steps S52-S54 until the preset iteration times are reached; and taking the condition attribute subset with the minimum expected error as the optimal condition attribute subset.
6. The method of claim 1, wherein the expected error for the subset of conditional attributes comprises:
min R reg (B)=R emp (B)+αI(B;D);
wherein, min R reg (B) Representing the expected error, R, of the attribute subset B emp (B) The empirical error of the subset B is represented, I (B; D) represents the mutual information of the attribute subset B and the decision attribute D, alpha is used as a hyperparameter, and alpha is more than or equal to 0.
7. The method of claim 1, wherein the environmental parameters associated with air quality comprise: including PM2.5, PM10, S02, NO2, CO, O3, TSP, DF.
8. The air quality prediction method based on rough set and structural risk minimization according to claim 1, characterized in that the air quality evaluation system establishes an air quality index grade evaluation system according to six air quality index grades of excellent, good, light pollution, moderate pollution, severe pollution and severe pollution according to national standard GB 3095-2012.
CN202211277183.2A 2022-10-19 2022-10-19 Air quality prediction method based on rough set and structure risk minimization Pending CN115471011A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211277183.2A CN115471011A (en) 2022-10-19 2022-10-19 Air quality prediction method based on rough set and structure risk minimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211277183.2A CN115471011A (en) 2022-10-19 2022-10-19 Air quality prediction method based on rough set and structure risk minimization

Publications (1)

Publication Number Publication Date
CN115471011A true CN115471011A (en) 2022-12-13

Family

ID=84337605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211277183.2A Pending CN115471011A (en) 2022-10-19 2022-10-19 Air quality prediction method based on rough set and structure risk minimization

Country Status (1)

Country Link
CN (1) CN115471011A (en)

Similar Documents

Publication Publication Date Title
AU2018101946A4 (en) Geographical multivariate flow data spatio-temporal autocorrelation analysis method based on cellular automaton
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
CN111178611B (en) Method for predicting daily electric quantity
Xiang et al. A clustering-based surrogate-assisted multiobjective evolutionary algorithm for shelter location problem under uncertainty of road networks
CN116108758A (en) Landslide susceptibility evaluation method
CN111639878A (en) Landslide risk prediction method and system based on knowledge graph construction
CN108564110B (en) Air quality prediction method based on clustering algorithm
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN114822709A (en) Method and device for analyzing multi-granularity accurate cause of atmospheric pollution
Hobson et al. Wi-Fi Based Occupancy Clustering and Motif Identification: A Case Study.
CN112801344A (en) Coastal zone ecosystem health prediction method based on DPSIR model, electronic equipment and computer readable medium
CN115471011A (en) Air quality prediction method based on rough set and structure risk minimization
CN114091961A (en) Power enterprise supplier evaluation method based on semi-supervised SVM
CN112733903A (en) Air quality monitoring and alarming method, system, device and medium based on SVM-RF-DT combination
CN113657441A (en) Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening
Li et al. Identifying urban form typologies in seoul with mixture model based clustering
CN113240225B (en) Power transmission and transformation project cost risk grading method based on fuzzy worst index
CN112766537A (en) Short-term electric load prediction method
Zhang et al. MOPNAR-II: an improved multi-objective evolutionary algorithm for mining positive and negative association rules
CN117709908B (en) Intelligent auditing method and system for distribution rationality of power grid engineering personnel, materials and machines
CN117408742B (en) User screening method and system
CN110543983A (en) cost sensitive active learning method for gas well section type prediction
CN114610966A (en) Method for calculating and comparing economic efficiency and energy consumption of enterprise
CN117575386A (en) Sponge city toughness evaluation method
CN117455551A (en) Industry electricity consumption prediction method based on industry relation complex network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination