CN115453064A

CN115453064A - Fine particle air pollution cause analysis method and system

Info

Publication number: CN115453064A
Application number: CN202211157306.9A
Authority: CN
Inventors: 汪先锋; 张庆竹; 王国强; 贾曼; 李田帅; 李磊; 牟江山
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-09
Anticipated expiration: 2042-09-22
Also published as: CN115453064B

Abstract

The invention belongs to the technical field of air pollution cause analysis, and relates to a fine particulate matter air pollution cause analysis method and system, wherein the obtained sampling point monitoring data is subjected to data preprocessing, and the monitoring data comprises fine particulate matter concentration and characteristic variable data; processing the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particles; preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particulate matters; carrying out partial dependence analysis on each characteristic variable, and determining a control interval of the characteristic variable on the concentration of the fine particles; extracting a data sample with the concentration of fine particulate matters exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating a specific contribution value of each characteristic variable of each pollution stage; the invention can realize the analysis of the pollution cause and is beneficial to configuring a corresponding treatment scheme.

Description

Fine particle air pollution cause analysis method and system

Technical Field

The invention belongs to the technical field of air pollution cause analysis, and relates to a method and a system for analyzing air pollution causes of fine particles.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The long-term exposure to the air pollution environment can cause diseases of cardiovascular system, respiratory system and the like. Therefore, the air pollution treatment problem is very important in all countries. Fine particles mean particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in the ambient air, also known as PM _2.5 Is an important measurement index for environmental pollution, and accurately analyzes and quantificationally influences PM _2.5 The contribution of the formed driving factors is necessary and meaningful for accurately preventing and controlling the air pollution.

To the knowledge of the inventor, the traditional chemical transport model represented by the gordad earth observation system chemical transport model (GEOS-Chem), weather research and forecast, and the community multi-scale air quality model (WRF-CMAQ) is often used to research air pollution. The Gordad earth observation system chemical transmission model can be used for analyzing PM _2.5 The source and process of the composition space change, and weather research and forecast and the community multi-scale air quality mode can calculate the weather conditions, artificial emission and heterogeneous chemistry to PM _2.5 The influence of (c). But traditional chemical transport models are subject to large deviations due to uncertainties in emissions inventory, physical and chemical parameters.

Disclosure of Invention

The invention aims to solve the problems and provides a method and a system for analyzing the cause of the air pollution of fine particles.

According to some embodiments, the invention adopts the following technical scheme:

a fine particle air pollution cause analysis method comprises the following steps:

carrying out data preprocessing on the acquired sampling point monitoring data, wherein the monitoring data comprises fine particle concentration and characteristic variable data;

processing the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particles;

preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of the fine particulate matters;

performing partial dependence analysis on each characteristic variable to determine a control interval of the characteristic variable on the concentration of the fine particles;

and extracting a data sample with the concentration of fine particulate matters exceeding a set value, dividing the data sample into a plurality of pollution stages, processing the data sample by using the machine learning model, and quantitatively calculating the specific contribution value of each characteristic variable of each pollution stage.

As an alternative embodiment, the monitoring data includes gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data.

As an alternative implementation, the machine learning model is a random forest model, the training process includes randomly dividing a part of the preprocessed data into a training set of the random forest model, using the other part of the preprocessed data as a test set of the model, selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of n _ estimators and max _ depth which are the most important parameters of the random forest model, and gradually determining the number of corresponding decision trees and the depth of the decision trees when the performance of the model is optimal through the learning curve.

As an alternative embodiment, the method further comprises evaluating the trained machine learning model, and the specific process comprises evaluating the result accuracy of the random forest model test set by respectively adopting a decision coefficient, an average absolute error and a root mean square error.

As an alternative embodiment, the specific process of preliminarily and qualitatively evaluating the influence of each characteristic variable on the concentration of fine particulate matter is: the machine learning model scrambles data corresponding to each feature according to a ranking importance algorithm, and then carries out training prediction according to the model after the scrambling; repeating the steps for a plurality of times, wherein the characteristic weight is reduced after the data set is disturbed, the more the reduction is, the more important the characteristic is, and the basically unchanged the characteristic has no influence on the concentration of the fine particulate matters.

As an alternative embodiment, the partial dependence analysis is performed on each characteristic variable, and the specific process of determining the control interval of the characteristic variable on the concentration of the fine particulate matters comprises the steps of controlling the variation values of the designated factors in the set range respectively, averaging the corresponding variation of the pollutant concentration predicted by the model, and determining the response or the cooperative response relation of a plurality of characteristics to the predicted result so as to evaluate the sensitivity of the characteristic variable to the result.

As an alternative embodiment, the specific process of quantitatively calculating the specific contribution value of each characteristic variable at each contamination stage is to calculate the specific contribution value of each characteristic to the concentration of fine particulate matter in each data sample using the salpril additive interpretation algorithm.

Further, a feature matrix composed of other feature variables is placed in a machine learning model to calculate a specific contribution value of each feature to the concentration of the fine particulate matters in each data sample, after the operation is repeated for multiple times, all the specific contribution values are derived, each air pollution stage is ranked according to the average absolute value of the specific contribution values, the first N feature variables which have large contribution to the concentration of the fine particulate matters are screened out, a time sequence of each feature specific contribution value in each data sample of each air pollution stage is drawn, and therefore the contribution of each feature to the concentration of the fine particulate matters in each time node is judged.

N is a positive integer.

A fine particulate air pollution cause analysis system comprising:

the preprocessing module is configured to perform data preprocessing on the acquired sampling point monitoring data, and the monitoring data comprise fine particle concentration and characteristic variable data;

the model processing module is configured to process the preprocessed data by using the trained machine learning model to obtain a data relation between the characteristic variable and the concentration of the fine particulate matters;

a preliminary qualitative analysis module configured to preliminarily qualitatively evaluate an influence of each of the characteristic variables on the concentration of the fine particulate matter;

the partial dependence analysis module is configured to perform partial dependence analysis on each characteristic variable and determine a control interval of the characteristic variable on the concentration of the fine particulate matters;

and the quantitative analysis module is configured to extract a data sample with the concentration of the fine particulate matters exceeding a set value, divide the data sample into a plurality of pollution stages, process the data sample by using the machine learning model, and quantitatively calculate the specific contribution value of each characteristic variable of each pollution stage.

A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, based on the data of the atmospheric super monitoring station, a machine learning method is utilized to deeply mine various data factors influencing air pollution, and characteristic variables and PM are constructed _2.5 The concentration is linear or nonlinear, and model results are sufficiently interpretable for analysis on the basis of the concentration.

The invention can preliminarily judge the influence of the characteristic factors on air pollution through qualitative analysis and can calculate the PM caused by two characteristics _2.5 In order to distinguish each characteristic pair from the PM _2.5 And the concentration is controlled in a range, so that the pollutants can be accurately treated.

The method can also quantitatively calculate the specific contribution of the characteristic factors to the pollution, and provides a set of more detailed air pollution cause analysis thought taking data driving as a framework for decision management departments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a schematic view of the quantitative analysis process of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

A method for analyzing cause of air pollution of fine particles, as shown in fig. 1, comprising the steps of:

step 1, carrying out data processing on the obtained sample point (Zibo) online monitoring data in autumn and winter;

step 2, performing time series analysis on the processed data set;

step 3, dividing the data set into a training set and a testing set, distinguishing features and labels, putting the training set into a random forest model for training and adjusting parameters, and testing whether the trained model meets requirements or not by using the testing set;

step 4, evaluating the model precision and determining that the model precision meets the requirements;

and 5, performing ranking importance, partial dependence and salpril additivity interpretation on the results obtained by the model meeting the requirements.

Specifically, in this embodiment, the online monitoring data in step 1 includes: data on gaseous pollutants: PM (particulate matter) _2.5 ,SO ₂ ,NO ₂ ,CO,O ₃ (ii) a Meteorological data: temperature, relative humidity, atmospheric pressure, wind speed, wind direction; carbon data: OC and EC; ion data:

Cl ^- 、K ⁺ 、Mg ²⁺ 、Ca ²⁺ 、

F ^- 、Na ⁺ (ii) a Element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn.

The data time resolution was 1 hour.

Of course, in other embodiments, the offline data may be used or the data type may be changed according to specific environments and requirements, which are not described herein again.

In some embodiments, gaseous pollutant data and gaseous data are plotted at half-month intervals in step 2, and ionic data, carbon data, and elemental data are tabulated in monthly average concentrations for lateral comparisons. Time series analysis can demonstrate air quality comparison in autumn and winter.

By transverse comparison, the PM to be investigated can be analyzed after determination more appropriately _2.5 And (4) a concentration threshold value. Of course, in some embodiments, step 2 may be omitted.

In some embodiments, in step 3, a 70% data volume is randomly assigned as a training set of the random forest model, and a 30% data volume is used as a testing set of the model. And (3) selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of the most important n _ estimators and max _ depth of the random forest model. And gradually determining the number of the corresponding decision trees and the depth of the decision trees when the model performance is optimal through the learning curve.

In some embodiments, in step 3, after the model parameter adjustment is completed, the gaseous pollutant data and the meteorological data within one hour are acquiredTaking as characteristics all variables contained in the data, carbon data, ion data and element data, PM _2.5 Concentration is used as a label, so that all characteristic variables and PM in the current hour are analyzed by using a random forest model _2.5 The data relationship between them.

And testing whether the trained model meets the requirements or not by using the test set.

If so, go to step 4.

In some embodiments, step 4, a decision factor (R) is used ² ) And evaluating the result accuracy of the random forest model test set by using the average absolute error (MAE) and the Root Mean Square Error (RMSE). The calculation formulas are respectively as follows:

wherein N represents the total number of data samples, i represents the ith data sample, and y _i PM that is the ith data sample _2.5 The concentration of the active carbon is observed,

represents the ith data sample PM _2.5 The predicted concentration of (a) is determined,

represents PM _2.5 The mean value of the concentration was observed.

And (5) the result precision of the random forest model test set meets the requirement, and the step is carried out.

In some embodiments, in step 5, the ranking importance is a more scientific evaluation algorithm for evaluating the influence degree of the feature variables on the model prediction result. The calculation formula is as follows:

in the formula (I), the compound is shown in the specification,

representing a shuffled data set, i, constructed by rearranging the features j and repeating k times _j Is the weight of the feature j, j represents each feature, k is the number of iterations, s is the performance score of the random forest model on the test data set D,

representing models in datasets

A performance score of (a).

In some embodiments, a partial dependency algorithm (PDP) may implement the variable sensitivity analysis in step 5. The method is characterized in that the change values of the designated factors are respectively controlled within a set range, and the corresponding change of the pollutant concentration predicted by the model is averaged. The partial dependence algorithm can realize the response or cooperative response relation of one characteristic or two characteristics to the predicted result so as to evaluate the sensitivity of the characteristic variable to the result. The algorithm formula is as follows:

in the formula, X _S Set representing one or two features to be investigated, X _C Is a collection of other features that are,

representing a random forest model.

In some embodiments, step 5, as shown in FIG. 2, the Sha Puli additive interpretation algorithm accounts for the contribution (to PM) made by each participant (i.e., each feature variable) by considering _2.5 Influence of concentration) ofThe profit of the cooperation (the average of the marginal effects of each feature on the degree of impact on the result) is distributed fairly. The calculation formula is as follows:

in the formula, x _i Representing each sample with N features, f (x) _i ) Representing the predicted value (i.e., PM) corresponding to each sample having N features _2.5 Predicted value), phi ₀ (f, x) represents the expected value (base value), φ, of the random forest model output on the data set _j (f,x _i ) Is the feature j to the sample x _i Predict the sharley value that the result affects.

φ _j (f,x _i ) Represents the sharley value of each feature in each sample, which is a weighted average of all possible combinations of the subset of variables. The specific algorithm is as follows:

in the formula, phi _j (f, x) represents the Shapley value of feature j, S is a subset of features, x ₁ ,x ₂ …x _n Representing the respective feature, | S | is a non-zero term in the subset S, f _x (S) represents the predicted value of subset S.

It should be noted that the above values can be determined according to specific prediction requirements, and in various embodiments, the above exemplary value ranges are not limited to be adjusted according to the requirements.

Similarly, the monitoring data may be increased or decreased in different embodiments, and is not limited to the ranges given in the above embodiments, and may include the concentration of the fine particulate matter and the characteristic variable data to be studied.

As an exemplary embodiment:

step 1, acquiring on-line measurement data of the Zibo super monitoring station from 9 months to 12 months in 2021 year, wherein the on-line measurement data comprise gaseous pollutant data: PM (particulate matter) _2.5 、SO ₂ 、NO ₂ 、CO、O ₃ The time resolution is 1h; meteorological data: temperature, relative humidity, atmospheric pressure, wind speed, wind direction and time resolution of 1h; carbon data: OC and EC, and the time resolution is 1h; ion data:

Cl ^- 、K ⁺ 、Mg ²⁺ 、Ca ² ⁺ 、

F ^- 、Na ⁺ the time resolution is 1h; element data: al, si, K, ca, V, cr, mn, fe, co, ni, cu, zn, and the time resolution is 1h.

And 2, preprocessing data. The method comprises the following specific steps: and (4) directly deleting the abnormal mutation values, and filling the rest missing data by adopting corresponding average values except that the wind direction missing data is filled by adopting numerical values with high occurrence frequency.

And step 3, drawing a time sequence chart. The gaseous pollutant data and meteorological data time series are drawn in a graph, and in order to better compare the variation trend between different species, CO and SO are used ₂ Group I, O ₃ And NO ₂ One group, temperature and relative humidity group, PM _2.5 And wind direction are each a separate group; since there are many species of carbon data, ion data, and element data, monthly averages are presented in the table. Through observing the time sequence chart and the average value of the monthly degrees of the species, the indexes of the species reach the peak value in 12 months in winter, and the period is the serious period of air pollution.

Step 4, according to the air quality index, the PM is mixed _2.5 The concentration is graded to distinguish between a cleaning phase and a contamination phase. Specifically, PM is as follows _2.5 ＜75μg/m ³ Considered clean, 75. Ltoreq. PM _2.5 ≤250μg/m ³ PM regarded as pollution _2.5 ＞250μg/m ³ It is considered as a serious contamination.

Step 5, preliminarily analyzing the average concentration of each species, wherein the secondary inorganic aerosol

And

in PM _2.5 The mass concentration of the active carbon is 58 percent of the highest mass concentration. And (4) grading according to the air quality index, and comparing the data of various species in the clean stage, the pollution stage and the serious pollution stage.

Step 6, training PM based on machine learning _2.5 The model of the response relation between the concentration and various characteristic variables comprises the following specific steps:

6.1 the processed data set is processed according to the following steps of 7: and 3, randomly dividing a training set and a testing set, wherein the training set is used for training the random forest model, and the testing set is used for checking the accuracy of the model. Specifically, the corresponding parameters of the model with good performance are determined through a learning curve. The number of the decision trees is 601, the depth of the maximum tree is 20, the change of the coefficient is determined by referring to the model in the parameter adjusting process, the parameter optimization model is continuously adjusted so as to obtain a final optimal model, and the model is stored.

6.2 determining the coefficient (R) based on ² ) And evaluating the accuracy of the random forest model by using the average absolute error (MAE) and the Root Mean Square Error (RMSE). The model was found to perform well, determining the coefficient R ² 0.93, mean absolute error MAE of 5.42, and root mean square error RMSE of 9.16.

Step 7, adopting an importance algorithm for arrangement to carry out PM pair on each characteristic variable _2.5 The influence of the concentration is subjected to preliminary qualitative evaluation, and the specific steps are as follows:

7.1 Algorithm formula of random forest model according to ranking importance

And (4) disordering the data corresponding to each feature, and then training and predicting according to the disordering model.

7.2 repeat the above step k times, if the data set is scrambled, the feature weight decreases, and the more decrease the more important the feature is, and if it is substantially unchanged the feature is represented to PM _2.5 There is no effect.Wherein the content of the first and second substances,

the weight ratio was maximal, 0.64 before the decrease and 0.28 after the decrease, indicating that

For PM _2.5 The effect is greatest.

Step 8, carrying out secondary inorganic aerosol

And

a partial dependence analysis was performed. According to

The order of (1) divides the secondary inorganic aerosol into three groups, discussing each combination in turn to PM _2.5 In a synergistic control action of

Determining three ion pairs PM by taking concentration as reference _2.5 Control interval of concentration.

Step 9, calculating the PM pair of each feature in each data sample by using a Shapril additive interpretation algorithm (SHAP) formula based on a random forest model _2.5 The specific contribution value of (a). The method comprises the following specific steps:

9.1 extraction of PM _2.5 ＞75μg/m ³ The data samples are divided into 10 contamination phases according to the time interval of the data samples. The time interval does not exceed 7 days at most, such as 10 months, 1 day, 20:00 occurrence of air Pollution (PM) _2.5 ＞75μg/m ³ ) 10 month, 4 days 12:00 air pollution, 10 months, 12 days, 14:00, if air pollution occurs, the first two data samples are classified into the same air pollution stage, the next data sample is classified into the other air pollution stage, and so on.

9.2 introducing a trained random forest model and introducing an air pollution data sample. Mixing PM _2.5 Setting as a label, putting a feature matrix composed of other feature variables into a random forest model to calculate PM pairs of each feature in each data sample _2.5 The specific contribution value of (a).

9.3 calculating the PM pairs in each data sample for each feature of the 8 air pollution phases according to the above procedure _2.5 The specific contribution sharley value.

9.4 deriving all Shapley values, ranking each air pollution stage according to the average absolute value of Shapley values, and screening out PM _2.5 The first 5 characteristic variables contributing greatly are drawn, and the time sequence of each characteristic Shapley value in each data sample of each air pollution stage is drawn, so that the PM node pair at each time node of each characteristic is judged _2.5 The contribution of (1) provides data information in hours for decision management departments, so that the air pollution is treated more accurately.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A fine particle air pollution cause analysis method is characterized by comprising the following steps:

carrying out data preprocessing on the obtained sampling point monitoring data, wherein the monitoring data comprises fine particulate matter concentration and characteristic variable data;

2. The method of analyzing cause of fine particulate air pollution of claim 1, wherein the monitored data comprises gaseous pollutant data, meteorological data, ion data, elemental data, and carbon data.

3. The fine particulate air pollution cause analysis method as claimed in claim 1, wherein the machine learning model is a random forest model, the training process comprises randomly dividing a part of the preprocessed data into a training set of the random forest model and a testing set of the random forest model, selecting a model parameter adjusting method for drawing a learning curve to adjust parameters of n _ estimators and max _ depth which are the most important parameters of the random forest model, and gradually determining the number of the corresponding decision trees and the depth of the decision trees when the model performance is the best according to the learning curve.

4. The method for analyzing the cause of the fine particle air pollution as recited in claim 1 or 3, further comprising evaluating the trained machine learning model, wherein the specific process comprises evaluating the result accuracy of the random forest model test set by respectively adopting a decision coefficient, an average absolute error and a root mean square error.

5. The fine particulate air pollution cause analysis method according to claim 1, wherein the specific process of preliminarily and qualitatively evaluating the influence of each characteristic variable on the fine particulate concentration is: the machine learning model scrambles data corresponding to each feature according to a ranking importance algorithm, and then carries out training prediction according to the model after the scrambling; repeating the steps for a plurality of times, wherein the characteristic weight is reduced after the data set is disturbed, the more the reduction is, the more important the characteristic is, and the basically unchanged the characteristic has no influence on the concentration of the fine particulate matters.

6. The method as claimed in claim 1, wherein the step of performing a partially dependent analysis on each of the characteristic variables, and the step of determining the control interval of the characteristic variables with respect to the concentration of the fine particulate matters comprises the step of evaluating the sensitivity of the characteristic variables to the result by controlling the variation values of the designated factors within the set ranges, respectively, and averaging the corresponding variation of the concentration of the pollutants predicted by the model, and determining the response or the cooperative response relationship of the plurality of characteristics to the prediction result.

7. The method as claimed in claim 1, wherein the specific process of quantitatively calculating the specific contribution value of each characteristic variable in each pollution stage is calculating the specific contribution value of each characteristic to the concentration of fine particles in each data sample using a Shapril additive interpretation algorithm.

8. The method for analyzing cause of fine particulate air pollution as claimed in claim 7, wherein a feature matrix composed of other feature variables is put into a machine learning model to calculate a specific contribution value of each feature to the fine particulate concentration in each data sample, after repeating for a plurality of times, all the specific contribution values are derived, each air pollution stage is ranked according to an average absolute value of the specific contribution values, the first N feature variables contributing more to the fine particulate concentration are screened out, and a time sequence of the specific contribution value of each feature in each data sample of each air pollution stage is drawn, so that the contribution of each feature to the fine particulate concentration at each time node is judged.

9. A fine particle air pollution cause analysis system is characterized by comprising:

a preliminary qualitative analysis module configured to preliminarily qualitatively evaluate an influence of each characteristic variable on the concentration of the fine particulate matter;

and the quantitative analysis module is configured to extract a data sample with the concentration of the fine particulate matters exceeding a set value, divide the data sample into a plurality of pollution stages, process the data sample by using the machine learning model, and quantitatively calculate a specific contribution value of each characteristic variable in each pollution stage.

10. A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any one of claims 1-8.