CN115453064A - Fine particle air pollution cause analysis method and system - Google Patents

Fine particle air pollution cause analysis method and system Download PDF

Info

Publication number
CN115453064A
CN115453064A CN202211157306.9A CN202211157306A CN115453064A CN 115453064 A CN115453064 A CN 115453064A CN 202211157306 A CN202211157306 A CN 202211157306A CN 115453064 A CN115453064 A CN 115453064A
Authority
CN
China
Prior art keywords
data
concentration
characteristic variable
fine particulate
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211157306.9A
Other languages
Chinese (zh)
Other versions
CN115453064B (en
Inventor
汪先锋
张庆竹
王国强
贾曼
李田帅
李磊
牟江山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211157306.9A priority Critical patent/CN115453064B/en
Publication of CN115453064A publication Critical patent/CN115453064A/en
Application granted granted Critical
Publication of CN115453064B publication Critical patent/CN115453064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0062General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/20Air quality improvement or preservation, e.g. vehicle emission control or emission reduction by using catalytic converters

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Combustion & Propulsion (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Sampling And Sample Adjustment (AREA)

Abstract

The invention belongs to the technical field of air pollution cause analysis, and relates to a fine particulate matter air pollution cause analysis method and system, wherein the obtained sampling point monitoring data is subjected to data preprocessing, and the monitoring data comprises fine particulate matter concentration and characteristic variable data; processing the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particles; preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particulate matters; carrying out partial dependence analysis on each characteristic variable, and determining a control interval of the characteristic variable on the concentration of the fine particles; extracting a data sample with the concentration of fine particulate matters exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating a specific contribution value of each characteristic variable of each pollution stage; the invention can realize the analysis of the pollution cause and is beneficial to configuring a corresponding treatment scheme.

Description

Fine particle air pollution cause analysis method and system
Technical Field
The invention belongs to the technical field of air pollution cause analysis, and relates to a method and a system for analyzing air pollution causes of fine particles.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The long-term exposure to the air pollution environment can cause diseases of cardiovascular system, respiratory system and the like. Therefore, the air pollution treatment problem is very important in all countries. Fine particles mean particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in the ambient air, also known as PM 2.5 Is an important measurement index for environmental pollution, and accurately analyzes and quantificationally influences PM 2.5 The contribution of the formed driving factors is necessary and meaningful for accurately preventing and controlling the air pollution.
To the knowledge of the inventor, the traditional chemical transport model represented by the gordad earth observation system chemical transport model (GEOS-Chem), weather research and forecast, and the community multi-scale air quality model (WRF-CMAQ) is often used to research air pollution. The Gordad earth observation system chemical transmission model can be used for analyzing PM 2.5 The source and process of the composition space change, and weather research and forecast and the community multi-scale air quality mode can calculate the weather conditions, artificial emission and heterogeneous chemistry to PM 2.5 The influence of (c). But traditional chemical transport models are subject to large deviations due to uncertainties in emissions inventory, physical and chemical parameters.
Disclosure of Invention
The invention aims to solve the problems and provides a method and a system for analyzing the cause of the air pollution of fine particles.
According to some embodiments, the invention adopts the following technical scheme:
a fine particle air pollution cause analysis method comprises the following steps:
carrying out data preprocessing on the acquired sampling point monitoring data, wherein the monitoring data comprises fine particle concentration and characteristic variable data;
processing the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particles;
preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particulate matters;
performing partial dependence analysis on each characteristic variable to determine a control interval of the characteristic variable on the concentration of the fine particles;
and extracting a data sample with the concentration of fine particulate matters exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating the specific contribution value of each characteristic variable of each pollution stage.
As an alternative embodiment, the monitoring data includes gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data.
As an alternative implementation, the machine learning model is a random forest model, the training process includes randomly dividing a part of the preprocessed data into a training set of the random forest model, using the other part of the preprocessed data as a test set of the model, selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of n _ estimators and max _ depth which are the most important parameters of the random forest model, and gradually determining the number of corresponding decision trees and the depth of the decision trees when the performance of the model is optimal through the learning curve.
As an alternative embodiment, the method further comprises evaluating the trained machine learning model, and the specific process comprises evaluating the result accuracy of the random forest model test set by respectively adopting a decision coefficient, an average absolute error and a root mean square error.
As an alternative embodiment, the specific process of preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of fine particulate matter is: the machine learning model scrambles data corresponding to each feature according to a ranking importance algorithm, and then carries out training prediction according to the model after the scrambling; repeating the steps for a plurality of times, wherein the characteristic weight is reduced after the data set is disturbed, the more the reduction is, the more important the characteristic is, and the basically unchanged the characteristic has no influence on the concentration of the fine particulate matters.
As an alternative embodiment, the partial dependence analysis is performed on each characteristic variable, and the specific process of determining the control interval of the characteristic variable on the concentration of the fine particulate matters comprises the steps of controlling the variation values of the designated factors in the set range respectively, averaging the corresponding variation of the pollutant concentration predicted by the model, and determining the response or the cooperative response relation of a plurality of characteristics to the predicted result so as to evaluate the sensitivity of the characteristic variable to the result.
As an alternative embodiment, the specific process of quantitatively calculating the specific contribution value of each characteristic variable at each contamination stage is to calculate the specific contribution value of each characteristic to the concentration of fine particulate matter in each data sample using the salpril additive interpretation algorithm.
Further, a feature matrix composed of other feature variables is placed in a machine learning model to calculate a specific contribution value of each feature to the concentration of the fine particulate matters in each data sample, after the operation is repeated for multiple times, all the specific contribution values are derived, each air pollution stage is ranked according to the average absolute value of the specific contribution values, the first N feature variables which have large contribution to the concentration of the fine particulate matters are screened out, a time sequence of each feature specific contribution value in each data sample of each air pollution stage is drawn, and therefore the contribution of each feature to the concentration of the fine particulate matters in each time node is judged.
N is a positive integer.
A fine particulate air pollution cause analysis system comprising:
the preprocessing module is configured to perform data preprocessing on the acquired sampling point monitoring data, and the monitoring data comprise fine particle concentration and characteristic variable data;
the model processing module is configured to process the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particulate matters;
a preliminary qualitative analysis module configured to preliminarily qualitatively evaluate an influence of each of the characteristic variables on the concentration of the fine particulate matter;
the partial dependence analysis module is configured to perform partial dependence analysis on each characteristic variable and determine a control interval of the characteristic variable on the concentration of the fine particulate matters;
and the quantitative analysis module is configured to extract a data sample with the concentration of the fine particulate matters exceeding a set value, divide the data sample into a plurality of pollution stages, process the data sample by using the machine learning model, and quantitatively calculate the specific contribution value of each characteristic variable of each pollution stage.
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, based on the data of the atmospheric super monitoring station, a machine learning method is utilized to deeply mine various data factors influencing air pollution, and characteristic variables and PM are constructed 2.5 The concentration is linear or nonlinear, and model results are sufficiently interpretable for analysis on the basis of the concentration.
The invention can preliminarily judge the influence of the characteristic factors on air pollution through qualitative analysis and can calculate the PM caused by two characteristics 2.5 In order to distinguish each characteristic pair from the PM 2.5 And the concentration is controlled in a range, so that the pollutants can be accurately treated.
The method can also quantitatively calculate the specific contribution of the characteristic factors to the pollution, and provides a set of more detailed air pollution cause analysis thought taking data driving as a framework for decision management departments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic view of the quantitative analysis process of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
A method for analyzing cause of air pollution of fine particles, as shown in fig. 1, comprising the steps of:
step 1, carrying out data processing on the obtained sample point (Zibo) online monitoring data in autumn and winter;
step 2, performing time series analysis on the processed data set;
step 3, dividing the data set into a training set and a testing set, distinguishing features and labels, putting the training set into a random forest model for training and adjusting parameters, and testing whether the trained model meets requirements or not by using the testing set;
step 4, evaluating the model precision and determining that the model precision meets the requirements;
and 5, performing ranking importance, partial dependence and salpril additivity interpretation on the results obtained by the model meeting the requirements.
Specifically, in this embodiment, the online monitoring data in step 1 includes: data on gaseous pollutants: PM (particulate matter) 2.5 ,SO 2 ,NO 2 ,CO,O 3 (ii) a Meteorological data: temperature, relative humidity, atmospheric pressure, wind speed, wind direction; carbon data: OC and EC; ion data:
Figure BDA0003859330150000061
Cl - 、K + 、Mg 2+ 、Ca 2+
Figure BDA0003859330150000062
F - 、Na + (ii) a Element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn.
The data time resolution was 1 hour.
Of course, in other embodiments, the offline data may be used or the data type may be changed according to specific environments and requirements, which are not described herein again.
In some embodiments, gaseous pollutant data and gaseous data are plotted at half-month intervals in step 2, and ionic data, carbon data, and elemental data are tabulated in monthly average concentrations for lateral comparisons. Time series analysis can demonstrate air quality comparison in autumn and winter.
By transverse comparison, the PM to be investigated can be analyzed after determination more appropriately 2.5 And (4) a concentration threshold value. Of course, in some embodiments, step 2 may be omitted.
In some embodiments, in step 3, a 70% data volume is randomly assigned as a training set of the random forest model, and a 30% data volume is used as a testing set of the model. And (3) selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of the most important n _ estimators and max _ depth of the random forest model. And gradually determining the number of the corresponding decision trees and the depth of the decision trees when the model performance is optimal through the learning curve.
In some embodiments, in step 3, after the model parameter adjustment is completed, the gaseous pollutant data and the meteorological data within one hour are acquiredTaking as characteristics all variables contained in the data, carbon data, ion data and element data, PM 2.5 Concentration is used as a label, so that all characteristic variables and PM in the current hour are analyzed by using a random forest model 2.5 The data relationship between them.
And testing whether the trained model meets the requirements or not by using the test set.
If so, go to step 4.
In some embodiments, step 4, a decision factor (R) is used 2 ) And evaluating the result accuracy of the random forest model test set by using the average absolute error (MAE) and the Root Mean Square Error (RMSE). The calculation formulas are respectively as follows:
Figure BDA0003859330150000081
Figure BDA0003859330150000082
Figure BDA0003859330150000083
wherein N represents the total number of data samples, i represents the ith data sample, and y i PM that is the ith data sample 2.5 The concentration of the active carbon is observed,
Figure BDA0003859330150000084
represents the ith data sample PM 2.5 The predicted concentration of (a) is determined,
Figure BDA0003859330150000085
represents PM 2.5 The mean value of the concentration was observed.
And (5) the result precision of the random forest model test set meets the requirement, and the step is carried out.
In some embodiments, in step 5, the ranking importance is a more scientific evaluation algorithm for evaluating the influence degree of the feature variables on the model prediction result. The calculation formula is as follows:
Figure BDA0003859330150000086
in the formula (I), the compound is shown in the specification,
Figure BDA0003859330150000087
representing a shuffled data set, i, constructed by rearranging the features j and repeating k times j Is the weight of the feature j, j represents each feature, k is the number of iterations, s is the performance score of the random forest model on the test data set D,
Figure BDA0003859330150000088
representing models in datasets
Figure BDA0003859330150000089
A performance score of (a).
In some embodiments, a partial dependency algorithm (PDP) may implement the variable sensitivity analysis in step 5. The method is characterized in that the change values of the designated factors are respectively controlled within a set range, and the corresponding change of the pollutant concentration predicted by the model is averaged. The partial dependence algorithm can realize the response or cooperative response relation of one characteristic or two characteristics to the predicted result so as to evaluate the sensitivity of the characteristic variable to the result. The algorithm formula is as follows:
Figure BDA0003859330150000091
in the formula, X S Set representing one or two features to be investigated, X C Is a collection of other features that are,
Figure BDA0003859330150000092
representing a random forest model.
In some embodiments, step 5, as shown in FIG. 2, the Sha Puli additive interpretation algorithm accounts for the contribution (to PM) made by each participant (i.e., each feature variable) by considering 2.5 Influence of concentration) ofThe profit of the cooperation (the average of the marginal effects of each feature on the degree of impact on the result) is distributed fairly. The calculation formula is as follows:
Figure BDA0003859330150000093
in the formula, x i Representing each sample with N features, f (x) i ) Representing the predicted value (i.e., PM) corresponding to each sample having N features 2.5 Predicted value), phi 0 (f, x) represents the expected value (base value), φ, of the random forest model output on the data set j (f,x i ) Is the feature j to the sample x i Predict the sharley value that the result affects.
φ j (f,x i ) Represents the sharley value of each feature in each sample, which is a weighted average of all possible combinations of the subset of variables. The specific algorithm is as follows:
Figure BDA0003859330150000101
in the formula, phi j (f, x) represents the Shapley value of feature j, S is a subset of features, x 1 ,x 2 …x n Representing the respective feature, | S | is a non-zero term in the subset S, f x (S) represents the predicted value of subset S.
It should be noted that the above values can be determined according to specific prediction requirements, and in various embodiments, the above exemplary value ranges are not limited to be adjusted according to the requirements.
Similarly, the monitoring data may be increased or decreased in different embodiments, and is not limited to the ranges given in the above embodiments, and may include the concentration of the fine particulate matter and the characteristic variable data to be studied.
As an exemplary embodiment:
step 1, acquiring on-line measurement data of the Zibo super monitoring station from 9 months to 12 months in 2021 year, wherein the on-line measurement data comprise gaseous pollutant data: PM (particulate matter) 2.5 、SO 2 、NO 2 、CO、O 3 The time resolution is 1h; meteorological data: temperature, relative humidity, atmospheric pressure, wind speed, wind direction and time resolution of 1h; carbon data: OC and EC, and the time resolution is 1h; ion data:
Figure BDA0003859330150000102
Cl - 、K + 、Mg 2+ 、Ca 2 +
Figure BDA0003859330150000103
F - 、Na + the time resolution is 1h; element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn, and the time resolution is 1h.
And 2, preprocessing data. The method comprises the following specific steps: and (4) directly deleting the abnormal mutation values, and filling the rest missing data by adopting corresponding average values except that the wind direction missing data is filled by adopting numerical values with high occurrence frequency.
And step 3, drawing a time sequence chart. The gaseous pollutant data and meteorological data time series are drawn in a graph, and in order to better compare the variation trend between different species, CO and SO are used 2 Group I, O 3 And NO 2 One group, temperature and relative humidity group, PM 2.5 And wind direction are each a separate group; since there are many species of carbon data, ion data, and element data, monthly averages are presented in the table. Through observing the time sequence chart and the average value of the monthly degrees of the species, the indexes of the species reach the peak value in 12 months in winter, and the period is the serious period of air pollution.
Step 4, according to the air quality index, the PM is mixed 2.5 The concentration is graded to distinguish between a cleaning phase and a contamination phase. Specifically, PM is as follows 2.5 <75μg/m 3 Considered clean, 75. Ltoreq. PM 2.5 ≤250μg/m 3 PM regarded as pollution 2.5 >250μg/m 3 It is considered as a serious contamination.
Step 5, preliminarily analyzing the average concentration of each species, wherein the secondary inorganic aerosol
Figure BDA0003859330150000111
Figure BDA0003859330150000112
And
Figure BDA0003859330150000113
in PM 2.5 The mass concentration of the active carbon is 58 percent of the highest mass concentration. And (4) grading according to the air quality index, and comparing the data of various species in the clean stage, the pollution stage and the serious pollution stage.
Step 6, training PM based on machine learning 2.5 The model of the response relation between the concentration and various characteristic variables comprises the following specific steps:
6.1 the processed data set is processed according to the following steps of 7: and 3, randomly dividing a training set and a testing set, wherein the training set is used for training the random forest model, and the testing set is used for checking the accuracy of the model. Specifically, the corresponding parameters of the model with good performance are determined through a learning curve. The number of the decision trees is 601, the depth of the maximum tree is 20, the change of the coefficient is determined by referring to the model in the parameter adjusting process, the parameter optimization model is continuously adjusted so as to obtain a final optimal model, and the model is stored.
6.2 determining the coefficient (R) based on 2 ) And evaluating the accuracy of the random forest model by using the average absolute error (MAE) and the Root Mean Square Error (RMSE). The model was found to perform well, determining the coefficient R 2 0.93, mean absolute error MAE of 5.42, and root mean square error RMSE of 9.16.
Step 7, adopting an importance algorithm for arrangement to carry out PM pair on each characteristic variable 2.5 The influence of the concentration is subjected to preliminary qualitative evaluation, and the specific steps are as follows:
7.1 Algorithm formula of random forest model according to ranking importance
Figure BDA0003859330150000121
And (4) disordering the data corresponding to each feature, and then training and predicting according to the disordering model.
7.2 repeat the above step k times, if the data set is scrambled, the feature weight decreases, and the more decrease the more important the feature is, and if it is substantially unchanged the feature is represented to PM 2.5 There is no effect.Wherein the content of the first and second substances,
Figure BDA0003859330150000122
the weight ratio was maximal, 0.64 before the decrease and 0.28 after the decrease, indicating that
Figure BDA0003859330150000123
For PM 2.5 The effect is greatest.
Step 8, carrying out secondary inorganic aerosol
Figure BDA0003859330150000124
And
Figure BDA0003859330150000125
a partial dependence analysis was performed. According to
Figure BDA0003859330150000126
The order of (1) divides the secondary inorganic aerosol into three groups, discussing each combination in turn to PM 2.5 In a synergistic control action of
Figure BDA0003859330150000127
Determining three ion pairs PM by taking concentration as reference 2.5 Control interval of concentration.
Step 9, calculating the PM pair of each feature in each data sample by using a Shapril additive interpretation algorithm (SHAP) formula based on a random forest model 2.5 The specific contribution value of (a). The method comprises the following specific steps:
9.1 extraction of PM 2.5 >75μg/m 3 The data samples are divided into 10 contamination phases according to the time interval of the data samples. The time interval does not exceed 7 days at most, such as 10 months, 1 day, 20:00 occurrence of air Pollution (PM) 2.5 >75μg/m 3 ) 10 month, 4 days 12:00 air pollution, 10 months, 12 days, 14:00, if air pollution occurs, the first two data samples are classified into the same air pollution stage, the next data sample is classified into the other air pollution stage, and so on.
9.2 introducing a trained random forest model and introducing an air pollution data sample. Mixing PM 2.5 Setting as a label, putting a feature matrix composed of other feature variables into a random forest model to calculate PM pairs of each feature in each data sample 2.5 The specific contribution value of (a).
9.3 calculating the PM pairs in each data sample for each feature of the 8 air pollution phases according to the above procedure 2.5 The specific contribution sharley value.
9.4 deriving all Shapley values, ranking each air pollution stage according to the average absolute value of Shapley values, and screening out PM 2.5 The first 5 characteristic variables contributing greatly are drawn, and the time sequence of each characteristic Shapley value in each data sample of each air pollution stage is drawn, so that the PM node pair at each time node of each characteristic is judged 2.5 The contribution of (1) provides data information in hours for decision management departments, so that the air pollution is treated more accurately.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A fine particle air pollution cause analysis method is characterized by comprising the following steps:
carrying out data preprocessing on the obtained sampling point monitoring data, wherein the monitoring data comprises fine particulate matter concentration and characteristic variable data;
processing the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particles;
preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particulate matters;
performing partial dependence analysis on each characteristic variable to determine a control interval of the characteristic variable on the concentration of the fine particles;
and extracting a data sample with the concentration of fine particulate matters exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating the specific contribution value of each characteristic variable of each pollution stage.
2. The method of analyzing cause of fine particulate air pollution of claim 1, wherein the monitored data comprises gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data.
3. The fine particulate air pollution cause analysis method as claimed in claim 1, wherein the machine learning model is a random forest model, the training process comprises randomly dividing a part of the preprocessed data into a training set of the random forest model and a testing set of the random forest model, selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of n _ estimators and max _ depth which are the most important parameters of the random forest model, and gradually determining the number of the corresponding decision trees and the depth of the decision trees when the model performance is the best according to the learning curve.
4. The method for analyzing the cause of the fine particle air pollution as recited in claim 1 or 3, further comprising evaluating the trained machine learning model, wherein the specific process comprises evaluating the result accuracy of the random forest model test set by respectively adopting a decision coefficient, an average absolute error and a root mean square error.
5. The fine particulate air pollution cause analysis method according to claim 1, wherein the specific process of preliminarily and qualitatively evaluating the influence of each characteristic variable on the fine particulate concentration is: the machine learning model scrambles data corresponding to each feature according to a ranking importance algorithm, and then carries out training prediction according to the model after the scrambling; repeating the steps for a plurality of times, wherein the characteristic weight is reduced after the data set is disturbed, the more the reduction is, the more important the characteristic is, and the basically unchanged the characteristic has no influence on the concentration of the fine particulate matters.
6. The method as claimed in claim 1, wherein the step of performing a partially dependent analysis on each of the characteristic variables, and the step of determining the control interval of the characteristic variables with respect to the concentration of the fine particulate matters comprises the step of evaluating the sensitivity of the characteristic variables to the result by controlling the variation values of the designated factors within the set ranges, respectively, and averaging the corresponding variation of the concentration of the pollutants predicted by the model, and determining the response or the cooperative response relationship of the plurality of characteristics to the prediction result.
7. The method as claimed in claim 1, wherein the specific process of quantitatively calculating the specific contribution value of each characteristic variable in each pollution stage is calculating the specific contribution value of each characteristic to the concentration of fine particles in each data sample using a Shapril additive interpretation algorithm.
8. The method for analyzing cause of fine particulate air pollution as claimed in claim 7, wherein a feature matrix composed of other feature variables is put into a machine learning model to calculate a specific contribution value of each feature to the fine particulate concentration in each data sample, after repeating for a plurality of times, all the specific contribution values are derived, each air pollution stage is ranked according to an average absolute value of the specific contribution values, the first N feature variables contributing more to the fine particulate concentration are screened out, and a time sequence of the specific contribution value of each feature in each data sample of each air pollution stage is drawn, so that the contribution of each feature to the fine particulate concentration at each time node is judged.
9. A fine particle air pollution cause analysis system is characterized by comprising:
the preprocessing module is configured to perform data preprocessing on the acquired sampling point monitoring data, and the monitoring data comprise fine particle concentration and characteristic variable data;
the model processing module is configured to process the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particulate matters;
a preliminary qualitative analysis module configured to preliminarily qualitatively evaluate an influence of each characteristic variable on the concentration of the fine particulate matter;
the partial dependence analysis module is configured to perform partial dependence analysis on each characteristic variable and determine a control interval of the characteristic variable on the concentration of the fine particulate matters;
and the quantitative analysis module is configured to extract a data sample with the concentration of the fine particulate matters exceeding a set value, divide the data sample into a plurality of pollution stages, process the data sample by using the machine learning model, and quantitatively calculate a specific contribution value of each characteristic variable in each pollution stage.
10. A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any one of claims 1-8.
CN202211157306.9A 2022-09-22 2022-09-22 Fine particulate matter air pollution cause analysis method and system Active CN115453064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211157306.9A CN115453064B (en) 2022-09-22 2022-09-22 Fine particulate matter air pollution cause analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211157306.9A CN115453064B (en) 2022-09-22 2022-09-22 Fine particulate matter air pollution cause analysis method and system

Publications (2)

Publication Number Publication Date
CN115453064A true CN115453064A (en) 2022-12-09
CN115453064B CN115453064B (en) 2023-09-05

Family

ID=84306945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211157306.9A Active CN115453064B (en) 2022-09-22 2022-09-22 Fine particulate matter air pollution cause analysis method and system

Country Status (1)

Country Link
CN (1) CN115453064B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578948A (en) * 2023-07-12 2023-08-11 宁德时代新能源科技股份有限公司 Data correlation identification method, device, electronic equipment and medium
CN117314023A (en) * 2023-11-29 2023-12-29 智瑞碳(天津)科技有限公司 Atmospheric pollution data analysis method, system and computer storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239613A (en) * 2017-06-05 2017-10-10 南开大学 A kind of intelligent source class recognition methods based on online data and Factor Analysis Model
CN110379463A (en) * 2019-06-05 2019-10-25 山东大学 Marine algae genetic analysis and concentration prediction method and system based on machine learning
CN110378520A (en) * 2019-06-26 2019-10-25 浙江传媒学院 A kind of PM2.5 concentration prediction and method for early warning
CN110610279A (en) * 2019-09-27 2019-12-24 复旦大学 Method for identifying pollution source of atmospheric fine particulate matters and application thereof
US20200200648A1 (en) * 2018-02-12 2020-06-25 Dalian University Of Technology Method for Fault Diagnosis of an Aero-engine Rolling Bearing Based on Random Forest of Power Spectrum Entropy
CN111611296A (en) * 2020-05-20 2020-09-01 中科三清科技有限公司 PM2.5Pollution cause analysis method and device, electronic equipment and storage medium
WO2021051609A1 (en) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Method and apparatus for predicting fine particulate matter pollution level, and computer device
CN112613675A (en) * 2020-12-29 2021-04-06 南开大学 Analyzing pollution source and meteorological factor to PM of different degrees2.5Machine learning model of pollution impact contributions and effects
CN112687350A (en) * 2020-12-25 2021-04-20 中科三清科技有限公司 Source analysis method of air fine particulate matter, electronic device, and storage medium
US20210396729A1 (en) * 2020-06-23 2021-12-23 Dataa Development Co., Ltd. Small area real-time air pollution assessment system and method
CN113987912A (en) * 2021-09-18 2022-01-28 陇东学院 Pollutant on-line monitoring system based on geographic information
CN114611399A (en) * 2022-03-17 2022-06-10 北京工业大学 PM based on NGboost algorithm2.5Concentration long-time sequence prediction method
CN114936957A (en) * 2022-05-23 2022-08-23 福州大学 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239613A (en) * 2017-06-05 2017-10-10 南开大学 A kind of intelligent source class recognition methods based on online data and Factor Analysis Model
US20200200648A1 (en) * 2018-02-12 2020-06-25 Dalian University Of Technology Method for Fault Diagnosis of an Aero-engine Rolling Bearing Based on Random Forest of Power Spectrum Entropy
CN110379463A (en) * 2019-06-05 2019-10-25 山东大学 Marine algae genetic analysis and concentration prediction method and system based on machine learning
CN110378520A (en) * 2019-06-26 2019-10-25 浙江传媒学院 A kind of PM2.5 concentration prediction and method for early warning
WO2021051609A1 (en) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Method and apparatus for predicting fine particulate matter pollution level, and computer device
CN110610279A (en) * 2019-09-27 2019-12-24 复旦大学 Method for identifying pollution source of atmospheric fine particulate matters and application thereof
CN111611296A (en) * 2020-05-20 2020-09-01 中科三清科技有限公司 PM2.5Pollution cause analysis method and device, electronic equipment and storage medium
US20210396729A1 (en) * 2020-06-23 2021-12-23 Dataa Development Co., Ltd. Small area real-time air pollution assessment system and method
CN112687350A (en) * 2020-12-25 2021-04-20 中科三清科技有限公司 Source analysis method of air fine particulate matter, electronic device, and storage medium
CN112613675A (en) * 2020-12-29 2021-04-06 南开大学 Analyzing pollution source and meteorological factor to PM of different degrees2.5Machine learning model of pollution impact contributions and effects
CN113987912A (en) * 2021-09-18 2022-01-28 陇东学院 Pollutant on-line monitoring system based on geographic information
CN114611399A (en) * 2022-03-17 2022-06-10 北京工业大学 PM based on NGboost algorithm2.5Concentration long-time sequence prediction method
CN114936957A (en) * 2022-05-23 2022-08-23 福州大学 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ZHONGCHENG ZHANG ET AL.: "Machine learning combined with the PMF model reveal the synergistic effects of sources and meteorological factors on PM2.5 pollution", pages 3 - 7 *
康俊锋;黄烈星;张春艳;曾昭亮;姚申君;: "多机器学习模型下逐小时PM_(2.5)预测及对比分析", 中国环境科学, no. 05, pages 1895 - 1901 *
杭琦;杨敬辉;黄国荣;: "随机森林算法在空气质量评评价中的应用", 上海第二工业大学学报, no. 02, pages 129 - 132 *
王雨晨: "基于随机森林的上海市PM2.5质量浓度预测研究", pages 13 *
齐甜方;蒋洪迅;石晓文;: "面向多源数据沈阳市PM2.5浓度预测研究及实证分析", no. 05 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578948A (en) * 2023-07-12 2023-08-11 宁德时代新能源科技股份有限公司 Data correlation identification method, device, electronic equipment and medium
CN117314023A (en) * 2023-11-29 2023-12-29 智瑞碳(天津)科技有限公司 Atmospheric pollution data analysis method, system and computer storage medium
CN117314023B (en) * 2023-11-29 2024-02-20 智瑞碳(天津)科技有限公司 Atmospheric pollution data analysis method, system and computer storage medium

Also Published As

Publication number Publication date
CN115453064B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN115453064A (en) Fine particle air pollution cause analysis method and system
CN107944213B (en) PMF online source analysis method, PMF online source analysis system, terminal device and computer readable storage medium
CN109087277B (en) Method for measuring PM2.5 of fine air particles
CN107239613A (en) A kind of intelligent source class recognition methods based on online data and Factor Analysis Model
CN112613675A (en) Analyzing pollution source and meteorological factor to PM of different degrees2.5Machine learning model of pollution impact contributions and effects
CN111222216A (en) Pollutant source analysis method
CN115526298A (en) High-robustness comprehensive prediction method for concentration of atmospheric pollutants
CN114757413A (en) Bad data identification method based on time sequence series analysis coupling neural network prediction
Nair et al. Using machine learning to derive cloud condensation nuclei number concentrations from commonly available measurements
Fletcher et al. Quantifying uncertainty from aerosol and atmospheric parameters and their impact on climate sensitivity
CN115629159A (en) Ozone and precursor tracing method and device based on multi-source data
CN115034303A (en) Directional detection method and system for harmful substances in food
Jamalani et al. Monthly analysis of PM10 in ambient air of Klang Valley, Malaysia
CN110706004A (en) Farmland heavy metal pollutant tracing method based on hierarchical clustering
KR20210054805A (en) Analysis method for Characteristic of Organic Particulate Matters
CN116187861A (en) Isotope-based water quality traceability monitoring method and related device
CN114117893A (en) Method for analyzing atmospheric dust-fall pollution source and evaluating dust-fall marginal effect of pollution source
CN115810409A (en) VOCs pollutant analysis method and device, electronic equipment and storage medium
CN115526410A (en) Method for predicting atmospheric pollutant data based on multi-parameter spatial filtering prediction model
CN115064218A (en) Method and device for constructing pathogenic microorganism data identification platform
CN117538492B (en) On-line detection method and system for pollutants in building space
CN117171597B (en) Method, system and medium for analyzing polluted site based on microorganisms
Pedersen et al. The 1993 QUASIMEME laboratory-performance study: Trace metals in sediments and standard solutions
CN113990407B (en) Analytic method for analyzing content and source of polychlorinated naphthalene and homologues thereof
CN117172990B (en) Method and system for predicting migration of antibiotic pollution in groundwater environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant