CN115453064B

CN115453064B - Fine particulate matter air pollution cause analysis method and system

Info

Publication number: CN115453064B
Application number: CN202211157306.9A
Authority: CN
Inventors: 汪先锋; 张庆竹; 王国强; 贾曼; 李田帅; 李磊; 牟江山
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-09-05
Anticipated expiration: 2042-09-22
Also published as: CN115453064A

Abstract

The invention belongs to the technical field of air pollution cause analysis, and relates to a method and a system for analyzing the air pollution cause of fine particles, wherein data preprocessing is carried out on obtained sample point monitoring data, and the monitoring data comprises fine particle concentration and characteristic variable data; processing the preprocessed data by using a trained machine learning model to obtain a data relationship between the characteristic variable and the concentration of the fine particles; primarily and qualitatively evaluating influences of all characteristic variables on the concentration of fine particles; performing partial dependence analysis on each characteristic variable to determine a control interval of the characteristic variable on the concentration of the fine particles; extracting a data sample with the concentration of fine particles exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating a specific contribution value of each characteristic variable of each pollution stage; the invention can realize the analysis of pollution cause and is beneficial to configuring corresponding treatment schemes.

Description

Fine particulate matter air pollution cause analysis method and system

Technical Field

The invention belongs to the technical field of air pollution cause analysis, and relates to a method and a system for analyzing air pollution cause of fine particles.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Long-term exposure to air-contaminated environments can cause cardiovascular, respiratory, and other diseases. For this reason, the problem of atmospheric pollution is very important to be treated in various countries. The fine particles refer to the ambient airParticulate matter having aerodynamic equivalent diameters of less than or equal to 2.5 microns, also known as PM _2.5 Is an important measurement index for environmental pollution, and accurately analyzes and quantifies PM (particulate matter) influence _2.5 The contribution of the formed driving factors is very necessary and significant to accurately prevent and treat air pollution.

To the best of the inventors' knowledge, conventional chemical transport models, represented by the Godade earth observation system chemical transmission model (GEOS-Chem), weather research and forecast, and community multiscale air quality model (WRF-CMAQ), etc., are often used to study air pollution. The Godade earth observation system chemical transmission model can be used to analyze PM _2.5 Sources and processes of spatial variation of composition, whereas weather research and forecast and community multiscale air quality patterns can calculate weather conditions, artificial emissions and heterogeneous chemical pair PM _2.5 Is a function of (a) and (b). However, traditional chemical transportation models deviate greatly due to uncertainties in emissions inventory, physical and chemical parameters.

Disclosure of Invention

In order to solve the problems, the invention provides a method and a system for analyzing the cause of air pollution of fine particles, which takes a machine learning algorithm as a framework, breaks the property of a black box of a machine learning model, utilizes various algorithms such as an arrangement importance algorithm, a part of dependent algorithm, a saprolitic additive interpretation algorithm and the like to explain the contribution of various driving factors behind the air pollution, realizes the cause analysis of the pollution, and is beneficial to configuring corresponding treatment schemes.

According to some embodiments, the present invention employs the following technical solutions:

a method for analyzing the cause of air pollution of fine particles comprises the following steps:

performing data preprocessing on the obtained sampling point monitoring data, wherein the monitoring data comprise fine particulate matter concentration and characteristic variable data;

processing the preprocessed data by using a trained machine learning model to obtain a data relationship between the characteristic variable and the concentration of the fine particles;

primarily and qualitatively evaluating influences of all characteristic variables on the concentration of fine particles;

performing partial dependence analysis on each characteristic variable to determine a control interval of the characteristic variable on the concentration of the fine particles;

and extracting a data sample with the concentration of the fine particles exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating the specific contribution value of each characteristic variable of each pollution stage.

In alternative embodiments, the monitoring data includes gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data.

In an alternative embodiment, the machine learning model is a random forest model, the training process includes randomly dividing a part of the preprocessed data to be used as a training set of the random forest model, the other part of the preprocessed data is used as a test set of the model, and a model parameter adjustment method for drawing a learning curve is selected to carry out parameter adjustment on two parameters, namely n_rest and max_depth, which are the most important parameters of the random forest model, and the number of decision trees and the depth of the decision trees corresponding to the best model performance are gradually determined through the learning curve.

As an alternative implementation manner, the method further comprises evaluating the trained machine learning model, and the specific process comprises evaluating the result precision of the random forest model test set by adopting a decision coefficient, an average absolute error and a root mean square error respectively.

As an alternative embodiment, the specific process of preliminary qualitative assessment of the influence of each characteristic variable on the concentration of fine particulate matter is: the machine learning model scrambles the data corresponding to each feature according to the arrangement importance algorithm, and then carries out training prediction according to the scrambled model; repeating the steps for a plurality of times, if the feature weight is reduced after the data set is disturbed, the more the feature weight is reduced, the more important the feature is represented, and if the feature is basically unchanged, the feature has basically no influence on the concentration of the fine particles.

As an alternative implementation manner, the specific process of determining the control interval of the characteristic variable on the concentration of the fine particulate matter by carrying out partial dependent analysis on each characteristic variable comprises the steps of respectively controlling the change value of a designated factor within a set range, averaging the corresponding change of the concentration of the pollutant predicted by a model, and determining the response or cooperative response relation of a plurality of characteristics to a predicted result so as to evaluate the sensitivity of the characteristic variable on the result.

As an alternative embodiment, the specific process of quantitatively calculating the specific contribution value of each characteristic variable of each contamination stage is to calculate the specific contribution value of each characteristic to the concentration of fine particulate matter in each data sample using a saprolidine additive interpretation algorithm.

Further, a feature matrix formed by other feature variables is put into a machine learning model to calculate a specific contribution value of each feature to the concentration of the fine particles in each data sample, all the specific contribution values are derived after repeated for a plurality of times, each air pollution stage is ranked according to the average absolute value of the specific contribution values, the first N feature variables with large contribution to the concentration of the fine particles are screened out, and a time sequence of the specific contribution value of each feature in each data sample in each air pollution stage is drawn, so that the contribution of each feature to the concentration of the fine particles in each time node is judged.

N is a positive integer.

A fine particulate matter air pollution cause analysis system comprising:

the pretreatment module is configured to carry out data pretreatment on the obtained sampling point monitoring data, wherein the monitoring data comprises fine particulate matter concentration and characteristic variable data;

the model processing module is configured to process the preprocessed data by utilizing a trained machine learning model to obtain a data relationship between the characteristic variable and the concentration of the fine particles;

the primary qualitative analysis module is configured to primarily and qualitatively evaluate the influence of each characteristic variable on the concentration of the fine particles;

the partial dependence analysis module is configured to perform partial dependence analysis on each characteristic variable and determine a control interval of the characteristic variable on the concentration of the fine particles;

and the quantitative analysis module is configured to extract a data sample with the concentration of the fine particles exceeding a set value, divide the data sample into a plurality of pollution stages, process the data sample by using the machine learning model and quantitatively calculate the specific contribution value of each characteristic variable of each pollution stage.

A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; the computer readable storage medium is for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps in the method.

Compared with the prior art, the invention has the beneficial effects that:

the invention utilizes a machine learning method to deeply mine various data factors influencing air pollution based on the atmospheric super monitoring station data, and constructs characteristic variables and PM _2.5 Concentration is linear or nonlinear, and on this basis, the model results are sufficiently interpretable.

The invention can preliminarily judge the influence of the characteristic factors on the air pollution through qualitative analysis, and can calculate PM of two characteristics _2.5 To distinguish each feature from PM _2.5 The concentration control interval, thereby realizing the accurate treatment of pollutants.

The invention can also quantitatively calculate the specific contribution of the characteristic factors to pollution, and provides a set of more detailed air pollution cause analysis thought taking data driving as a framework for decision management departments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a schematic diagram of the quantitative analysis flow chart of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The method for analyzing the cause of air pollution of the fine particles comprises the following steps as shown in fig. 1:

step 1, carrying out data processing on the acquired sample point (the carrier) on-line monitoring data in autumn and winter;

step 2, performing time sequence analysis on the processed data set;

step 3, dividing the data set into a training set and a testing set, distinguishing characteristics and labels, putting the training set into a random forest model for training and parameter adjustment, and testing whether the trained model meets the requirements by using the testing set;

step 4, evaluating the model precision and determining that the model precision meets the requirement;

and 5, performing arrangement importance, partial dependence and saprolimus additive explanation on the results obtained by the models meeting the requirements.

Specifically, in this embodiment, the online monitoring data in step 1 includes: gaseous pollutant data: PM (particulate matter) _2.5 ,SO ₂ ,NO ₂ ,CO,O ₃ The method comprises the steps of carrying out a first treatment on the surface of the Weather data: temperature, relative humidity, barometric pressure, wind speed, wind direction; carbon data: OC, EC; ion data:Cl ^- 、K ⁺ 、Mg ²⁺ 、Ca ²⁺ 、/>F ^- 、Na ⁺ the method comprises the steps of carrying out a first treatment on the surface of the Element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn.

The data time resolution was 1 hour.

Of course, in other embodiments, offline data may be adopted or the types of data may be changed according to specific environments and requirements, which will not be described herein.

In some embodiments, the gaseous pollutant data and the meteorological data are displayed in the graph at half month intervals in step 2, and the ion data, the carbon data, and the elemental data are displayed in the form of month average concentrations in the table to form a lateral comparison. Analysis of the time series may reveal air quality comparisons in autumn and winter.

By transverse comparison, the PM to be studied can be analyzed after more appropriate determination _2.5 Concentration threshold. Of course, in some embodiments, step 2 may be omitted.

In some embodiments, in step 3, 70% of the data volume is randomly split as a training set for the random forest model and 30% of the data volume is used as a test set for the model. The model parameter tuning method for drawing the learning curve is selected to tune the two parameters of n_evastiators and max_depth which are the most important parameters of the random forest model. And gradually determining the number of the corresponding decision trees and the depth of the decision trees when the model performance is optimal through a learning curve.

In some embodiments, in step 3, after model tuning is completed, all variables contained in gaseous pollutant data, meteorological data, carbon data, ion data and element data within one hour are taken as characteristics, PM _2.5 Concentration is used as a label to analyze all characteristic variables and PM in the current hour using a random forest model _2.5 Data relationships between.

And testing whether the trained model meets the requirements or not by using the test set.

If so, go to step 4.

In some embodiments, in step 4, the decision coefficient (R ² ) Mean Absolute Error (MAE), root Mean Square Error (RMSE) to evaluateThe result accuracy of the random forest model test set. The calculation formulas are respectively as follows:

where N represents the total number of data samples, i represents the ith data sample, y _i PM being the ith data sample _2.5 The concentration was observed and the concentration was observed,representing the ith data sample PM _2.5 Predicted concentration of->Representing PM _2.5 The average of the concentrations was observed.

And 5, the result precision of the random forest model test set meets the requirement, and the step 5 is entered.

In some embodiments, in step 5, the ranking importance is a more scientific evaluation algorithm for evaluating the influence degree of the feature variable on the model prediction result. The calculation formula is as follows:

in the method, in the process of the invention,representing a shuffled dataset constructed after rearranging features j, i _j Is the weight of the feature j, j represents each feature, k is the iteration number, s is the random forest model in the test dataPerformance score on set D, +.>Representative model in dataset->Performance scores on.

In some embodiments, in step 5, a bias dependent algorithm (PDP) may implement variable sensitivity analysis. The method is characterized in that the change values of the designated factors are respectively controlled in a set range, and the corresponding changes of the pollutant concentration predicted by the model are averaged. The bias-dependent algorithm can realize the response or cooperative response relation of one or two features to the predicted result so as to evaluate the sensitivity of the feature variable to the result. The algorithm formula is as follows:

wherein X is _S Representing a set of one or two characteristics to be studied, X _C Is a collection of other features that are,representing a random forest model.

In some embodiments, in step 5, as shown in FIG. 2, the saprolitic additive interpretation algorithm accounts for the contribution (to PM) made by each participant (i.e., each feature variable) _2.5 Effect of concentration) to fairly distribute the benefits of the cooperation (average of the marginal effects of the extent of each feature effect on the result). The calculation formula is as follows:

wherein x is _i Represents each sample with N features, f (x _i ) Representing a predicted value (i.e., PM) for each sample having N features _2.5 Predicted value), phi ₀ (f, x) represents the random forest model output atExpected value (base value), phi, on the dataset _j (f,x _i ) Is the characteristic j versus sample x _i Shapley values of outcome effects are predicted.

φ _j (f,x _i ) The Shapley value representing each feature in each sample is a weighted average of all possible combinations of the variable subsets. The specific algorithm is as follows:

in phi _j (f, x) represents the Shapley value of feature j, S is a subset of features, x ₁ ,x ₂ …x _n Representing various features, |S| is a non-zero term in subset S, f _x (S) represents the predicted value of subset S.

It should be noted that the above values may be determined according to specific prediction requirements, and in different embodiments, may be adjusted according to requirements, and are not limited to the above exemplary numerical ranges.

Similarly, the monitoring data may be increased or decreased in different embodiments, and is not limited to the range given in the above embodiments, and may include the concentration of the fine particulate matter and the characteristic variable data to be studied.

As an exemplary embodiment:

step 1, acquiring online measurement data of the Zibo super monitoring station 2021 from 9 months to 12 months, wherein the online measurement data comprise gaseous pollutant data: PM (particulate matter) _2.5 、SO ₂ 、NO ₂ 、CO、O ₃ Time resolution is 1h; weather data: temperature, relative humidity, atmospheric pressure, wind speed, wind direction and time resolution ratio of 1h; carbon data: OC, EC, time resolution 1h; ion data:Cl ^- 、K ⁺ 、Mg ²⁺ 、Ca ² ⁺ 、/>F ^- 、Na ⁺ time resolution is 1h; element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn, time resolution 1h.

And 2, preprocessing data. The method comprises the following specific steps: the mutation abnormal value is directly deleted, and other missing data are filled by corresponding average values except that the wind direction missing data are filled by numerical values with high occurrence frequency.

And 3, drawing a time sequence chart. The time series of gaseous pollutant data and meteorological data are plotted in a graph, and CO and SO are used for better comparison of the change trend between different species ₂ Group O ₃ And NO ₂ One group, one group of temperature and relative humidity, PM _2.5 And wind direction, each individual group; the month average is presented in the table because of the greater number of carbon, ion, and elemental data species. And observing the time sequence diagram and the average species month value, and finding that each species index reaches a peak value in 12 months in winter, wherein the peak value is an air pollution serious period.

Step 4, PM is carried out according to the air quality index _2.5 Concentration classification to distinguish between cleaning and contamination phases. Specifically, as follows, PM _2.5 ＜75μg/m ³ Considered clean, 75.ltoreq.PM _2.5 ≤250μg/m ³ Regarded as pollution, PM _2.5 ＞250μg/m ³ Is considered to be a serious contamination.

Step 5, primarily analyzing the average concentration of each species, wherein the secondary inorganic aerosol And->Occupying PM _2.5 The total mass concentration is 58%, and the ratio is highest. The data for each species during the clean, contaminated and severely contaminated phases were compared according to the air quality index scale.

Step 6, training PM based on machine learning _2.5 Concentration and various characteristics changeThe model of the response relationship between the quantities comprises the following specific steps:

6.1 the processed dataset was processed according to 7: and 3, randomly dividing a training set and a testing set, wherein the training set is used for random forest model training, and the testing set is used for checking model accuracy. The corresponding parameters when the model performance is good are determined through a learning curve. The number of decision trees is 601, the maximum tree depth is 20, the parameter optimization model is continuously adjusted according to the change of the model decision coefficient in the parameter adjustment process, so that a final optimal model is obtained, and the model is stored.

6.2 according to the determination coefficient (R ² ) Random forest model accuracy was evaluated by Mean Absolute Error (MAE), root Mean Square Error (RMSE). The found model performs well, determining the coefficient R ² The mean absolute error MAE was 5.42 and the root mean square error RMSE was 9.16 at 0.93.

Step 7, adopting an importance algorithm for arrangement to perform PM on each characteristic variable pair _2.5 The influence of the concentration is primarily and qualitatively evaluated, and the specific steps are as follows:

7.1 random forest model formula according to ranking importance algorithmAnd disturbing the data corresponding to each feature, and then training and predicting according to the disturbed model.

7.2 repeating the above steps k times, if the feature weight decreases after scrambling the dataset, and if the decrease is more significant, the feature is represented more significant, and if it is substantially unchanged, the feature is represented to PM _2.5 No effect was observed. Wherein, the liquid crystal display device comprises a liquid crystal display device,the weight ratio is maximum, 0.64 before the decrease and 0.28 after the decrease, indicating +.>For PM _2.5 The effect is greatest.

Step 8, for the secondary inorganic aerosolAnd->And performing a partial dependency analysis. According toThe secondary mineral gas-soluble gel is divided into three groups in sequence, and PM is discussed in each combination pair in sequence _2.5 To>Concentration as reference to determine three ion pairs PM _2.5 Control interval of concentration.

Step 9, calculating PM pairs in each data sample for each feature using a saprolitic additive interpretation algorithm (SHAP) formula based on the random forest model _2.5 Specific contribution values of (3). The method comprises the following specific steps:

9.1 PM extraction _2.5 ＞75μg/m ³ Is divided into 10 contamination phases according to the time interval of the data samples. The time interval is at most 7 days, such as 10 months, 1 day and 20 days: 00 air Pollution (PM) _2.5 ＞75μg/m ³ ) 10 months, 4 days 12:00 air pollution occurs, 10 months and 12 days 14:00 air pollution occurs, the first two data samples are classified into the same air pollution stage, the latter data sample is classified into another air pollution stage, and so on.

9.2, introducing a trained random forest model and introducing an air pollution data sample. PM (particulate matter) _2.5 Setting the feature matrix formed by other feature variables as a label, putting the feature matrix into a random forest model, and calculating PM (particulate matter) of each feature in each data sample _2.5 Specific contribution values of (3).

9.3 calculating the PM for each feature in each data sample for 8 stages of air pollution according to the above procedure _2.5 Is a specific contribution to Shapley.

9.4 deriving all Shapley values, ranking each air pollution stage according to the average absolute value of the Shapley values, and screening PM _2.5 Contributing big top 5 feature variationsMeasuring and plotting the time sequence of each characteristic shape value in each data sample of each air pollution stage to judge the PM of each characteristic at each time node _2.5 And provides data information in units of hours for decision management departments, thereby more accurately treating air pollution.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The method for analyzing the cause of air pollution of the fine particles is characterized by comprising the following steps:

performing data preprocessing on the obtained sampling point monitoring data, wherein the monitoring data comprise fine particulate matter concentration and characteristic variable data; the monitoring data includes gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data; the gaseous pollutant data and the meteorological data are displayed in the graph at intervals of half a month, the ionic data, the carbon data and the element data are displayed in the form of average concentration in the form of month to form transverse comparison, the analysis of the time series can show air quality comparison in autumn and winter, and PM to be studied in later analysis can be determined through the transverse comparison _2.5 A concentration threshold;

processing the preprocessed data by using a trained machine learning model to obtain a data relationship between the characteristic variable and the concentration of the fine particles; evaluating the model precision and determining that the model precision meets the requirement; performing preliminary qualitative assessment on results obtained by the model meeting the requirements;

extracting a data sample with the concentration of fine particles exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating a specific contribution value of each characteristic variable of each pollution stage;

in the machine learning model training process, a training set is used for putting the training set into a random forest model for training and parameter adjustment, a test set is used for testing whether the trained model meets the requirements, the model precision is evaluated, and the model precision is determined to meet the requirements; the method comprises the following steps:

the machine learning model is a random forest model, the training process comprises the steps of randomly dividing a part of preprocessed data to be used as a training set of the random forest model, using the other part of preprocessed data as a test set of the model, and selecting a model parameter adjusting method for drawing a learning curve to be most important for the random forest modeln_estimatorsAndmax_depththe two parameters are subjected to parameter adjustment, and the number of corresponding decision trees and the depth of the decision trees when the model performance is optimal are gradually determined through a learning curve;

whether the trained model meets the requirements or not is tested by using the test set, and the result precision of the random forest model test set is evaluated by adopting a decision coefficient, an average absolute error and a root mean square error after the model meets the requirements; the calculation formulas are respectively as follows:

in the method, in the process of the invention,Nrepresenting the total number of data samples,represents->Data sample,/->Is->Of individual data samplesPM _2.5 Observed concentration,/->Represents->Individual data samplesPM _2.5 Predicted concentration of->Representative ofPM _2.5 An average value of the observed concentration;

the specific process for primarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particles comprises the following steps: the machine learning model scrambles the data corresponding to each feature according to the arrangement importance algorithm, and then carries out training prediction according to the scrambled model; repeating the steps for a plurality of times, if the feature weight is reduced after the data set is disturbed, the more the feature weight is reduced, the more important the feature is represented, and if the feature is basically unchanged, the feature has basically no influence on the concentration of the fine particles;

the ranking importance calculation formula is as follows:

in the method, in the process of the invention,representing the features to be characterizedjRearrangement, repetitionkA scrambled dataset constructed after a second time, < ->Is special toSign of signjIs used for the weight of the (c),jrepresenting the characteristics of the various features of the device,kfor the number of iterations,sin test data set for random forest modelDPerformance score on->Representative model in dataset->Performance scores on;

the specific process of determining the control interval of the characteristic variable to the concentration of the fine particulate matters comprises the steps of respectively controlling the change value of a designated factor in a set range, averaging the corresponding change of the concentration of the pollutants predicted by a model, and determining the response or cooperative response relation of a plurality of characteristics to a predicted result so as to evaluate the sensitivity of the characteristic variable to the result;

the specific process of carrying out the partial dependence analysis on each characteristic variable comprises the following steps:

the algorithm formula is as follows:

in the method, in the process of the invention,representing a set of one or two characteristics to be studied,/->Is a set of other features, +.>Representing a random forest model;

the specific process of quantitatively calculating the specific contribution value of each characteristic variable in each pollution stage is to calculate the specific contribution value of each characteristic to the concentration of fine particles in each data sample by using a saproli additive interpretation algorithm;

the calculation formula is as follows:

in the method, in the process of the invention,representative hasNEach sample of individual features, +.>Representative hasNPredictive value for each sample of the individual features, < >>Representing the expected value of the random forest model output on the dataset,/->Is characterized by->Sample->Predicting outcome impactShapleyA value;

representing each feature in each sampleShapleyA value that is a weighted average of all possible combinations of the variable subsets; the specific algorithm is as follows:

in the method, in the process of the invention,representative characteristics->A kind of electronic deviceShapleyThe value of the sum of the values,Sis a subset of features, +.>Representing individual features->Is a subset ofSNon-zero term of->Representative subsetSIs a predicted value of (2);

the characteristic matrix formed by other characteristic variables is put into a machine learning model to calculate the specific contribution value of each characteristic to the concentration of the fine particles in each data sample, all the specific contribution values are derived after repeated for a plurality of times, each air pollution stage is ranked according to the average absolute value of the specific contribution values, and the front part with large contribution to the concentration of the fine particles is screened outNAnd (3) characteristic variables, and drawing a time sequence of specific contribution values of each characteristic in each data sample of each air pollution stage, so as to judge the contribution of each characteristic to the concentration of the fine particulate matters at each time node.

2. A fine particulate air pollution cause analysis system employing a fine particulate air pollution cause analysis method as defined in claim 1, comprising:

the model processing module is configured to process the preprocessed data by utilizing a trained machine learning model to obtain a data relationship between the characteristic variable and the concentration of the fine particles; evaluating the model precision and determining that the model precision meets the requirement; performing preliminary qualitative assessment on results obtained by the model meeting the requirements;

3. A terminal device, comprising a processor and a computer readable storage medium, the processor configured to implement instructions; a computer readable storage medium is used for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps in the method as claimed in claim 1.