CN117314023A

CN117314023A - Atmospheric pollution data analysis method, system and computer storage medium

Info

Publication number: CN117314023A
Application number: CN202311606468.0A
Authority: CN
Inventors: 吕振辉
Original assignee: Zhirui Carbon Tianjin Technology Co ltd
Current assignee: Zhirui Carbon Tianjin Technology Co ltd
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2023-12-29
Anticipated expiration: 2043-11-29
Also published as: CN117314023B

Abstract

The embodiment of the application provides an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium. Wherein, in the air pollution data, a plurality of factors related to air quality are acquired; taking the factors as first variables, and determining target variables according to the first variables; establishing a plurality of relationships between the first variables and the target variables; determining variable ranking differences among a plurality of the first variables; calculating the contribution value of each first variable, and taking corresponding measures to optimize the target variable. According to the technical scheme, through the analysis and evaluation, more accurate analysis results of the atmospheric pollution data can be provided, and scientific basis is provided for environmental management and control decision, so that the degree of atmospheric pollution is reduced, and the air quality is improved.

Description

Atmospheric pollution data analysis method, system and computer storage medium

Technical Field

The embodiment of the application relates to the technical field of environmental science, in particular to an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium.

Background

Currently, atmospheric pollution has become one of the important environmental concerns of global concern. The atmospheric pollution is mainly caused by exhaust emission generated by activities such as industrial emission, traffic tail gas, coal burning and the like, and comprises harmful substances such as particulate matters, sulfur dioxide, nitrogen oxides and the like. Atmospheric pollution has serious effects on human health and ecological environment, such as respiratory diseases, acid rain, greenhouse effect, etc. Thus, methods and systems for accurately analyzing and assessing atmospheric pollution conditions become particularly important.

At present, the air pollution analysis mainly relies on sensor monitoring, utilizes an air quality sensor network to collect real-time air pollution data in cities, and evaluates the air quality by monitoring key indexes (such as PM2.5, PM10, SO2, NO2 and the like).

However, the data monitored by the sensor is affected by factors such as sensor accuracy, inaccurate calibration, etc., and certain errors and instabilities may exist.

Disclosure of Invention

The embodiment of the application provides an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium, which are used for solving the problem of low atmospheric pollution data analysis accuracy in the prior art.

In a first aspect, an embodiment of the present application provides an atmospheric pollution data analysis method, including:

In the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;

taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;

establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;

determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables;

determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;

and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.

Optionally, the establishing a relationship between the plurality of first variables and the target variable, where the relationship includes a control coefficient includes:

forming a first variable matrix by a plurality of first variables, and establishing a relation between the first variables and the target variables according to the first variable matrix and the target variables;

In the relation between the first variables and the target variables, the control coefficient is calculated by minimizing the sum of squares of residual errors between the first variables and the target variables and the obtained statistic which is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained.

Optionally, the determining the contribution value of each first variable based on the variable ranking differences among the plurality of first variables and the control coefficient includes:

determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient;

and calculating the contribution value of each first variable according to the dependency index and the control coefficient.

Optionally, the determining, by calculating the rank correlation coefficient between the plurality of first variables, a variable rank difference between the plurality of first variables includes:

calculating the correlation coefficient among a plurality of first variables through a calculation formula of a grade correlation coefficient, wherein the value range of the grade correlation coefficient is-1 to 1,1 represents complete positive correlation, -1 represents complete negative correlation, and 0 represents no correlation;

And determining variable ranking differences among the plurality of first variables based on correlation coefficients among the plurality of first variables.

Optionally, the taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables includes:

sorting the contribution values of the plurality of first variables;

for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain;

for a first variable having a contribution value lower than a preset contribution, the corresponding measures taken include: reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.

Optionally, the forming a plurality of the first variables into a first variable matrix, and establishing a relationship between the plurality of the first variables and the target variable according to the first variable matrix and the target variable includes:

forming a matrix x of p rows and 1 columns by p first variables according to the values of different sample points;

initializing a control coefficient;

establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:

wherein,，/>the value of the first variable representing the i-th sample,，β _i control representing the ith first variableCoefficient, beta ₀ Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;

the calculating the control coefficient by minimizing a sum of squares of residuals between the first variable and the target variable and the obtained statistics in the relationships between the plurality of first variables and the target variable includes:

calculating statistics according to the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, wherein the expression of the statistics is as follows:；

wherein T is represented as a statistic, beta is represented as the control coefficient, beta ₀ Expressed as a theoretical value of the control coefficient under zero design, SE (β) is expressed as a standard error of the control coefficient;

calculating the control coefficient through a preset estimation formula of the control coefficient, wherein the expression of the estimation of the preset control coefficient is as follows:；

wherein, beta is expressed as the control coefficient, x is expressed as a first variable matrix composed of a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not.

Optionally, the calculating the correlation coefficient between the plurality of first variables by the calculation formula of the level correlation coefficient includes:

calculating the correlation coefficient among a plurality of first variables by a calculation formula of the grade correlation coefficient, wherein the calculation formula of the grade correlation coefficient is as follows:；

wherein,representing the rank correlation coefficient, ++>Represents the sum of the squares of the level differences of a plurality of first variables, n represents the number of samples, and d represents the variable ranking differences between a plurality of said first variables.

Optionally, the determining the dependency index between the first variables according to the variable ranking differences between the first variables and the control coefficients includes:

By the formula:determining a dependency index between a plurality of the first variables;

wherein β is denoted as a control coefficient, e is denoted as a natural constant, α is denoted as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, d is denoted as a variable ranking difference between a plurality of the first variables;

-said calculating a contribution value of each of said first variables from said dependency index and said control coefficient, comprising:

by the formula:calculating the contribution value of each first variable;

where β is denoted as the control coefficient and DependencyIndex is denoted as the dependency index between the plurality of first variables.

In a second aspect, embodiments of the present application provide an atmospheric pollution data analysis system, including:

an acquisition module for acquiring a plurality of factors related to air quality in the air pollution data, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;

the determining module is used for taking the factors as first variables and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;

The building module is used for building a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;

the determining module is further used for determining variable ranking differences among the plurality of first variables by calculating the level correlation coefficients among the plurality of first variables; determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;

and the processing module is used for taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.

The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.

In the embodiment of the application, a plurality of factors related to air quality are acquired from air pollution data, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration; taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables; establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients; determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables; determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient; calculating a contribution value of each first variable according to the dependency index and the control coefficient; and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables. The method for analyzing the atmospheric pollution data has the following beneficial effects:

Multiple factors consider: the method comprehensively considers a plurality of factors related to air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration. This helps to more fully understand the mechanism of formation of atmospheric pollution and the influencing factors.

Target variable prediction: by taking a plurality of first variables as inputs, prediction data is generated as target variables. This facilitates future air quality predictions and evaluations, thereby taking appropriate environmental management and control measures in advance.

And (3) establishing a relation: a relationship between the plurality of first variables and the target variable is established and includes a control coefficient. This helps to determine the extent to which each of the first variables affects the target variable and further analyzes the dependency relationship between them.

Variable ranking variance analysis: the variable ranking differences of the first variables are determined by calculating the rank correlation coefficients between the first variables. This helps to understand the relative importance of each variable in atmospheric pollution so that higher importance variables are prioritized for control.

And (3) calculating a dependency index: a dependency index between the first plurality of variables is determined based on the variable ranking differences and the control coefficients. This helps to quantify the extent to which each of the first variables contributes to atmospheric pollution, further guiding the formulation of environmental management and control strategies.

Contribution value evaluation: calculating the contribution value of each first variable according to the dependency index and the control coefficient; this helps to determine which variables have a high contribution to atmospheric pollution, so that corresponding measures can be taken in a targeted manner.

Through the analysis and evaluation, the method can provide more accurate analysis results of the atmospheric pollution data, and provide scientific basis for environmental management and control decision so as to reduce the degree of atmospheric pollution and improve the air quality.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a flow chart of one embodiment of an atmospheric pollution data analysis method provided herein;

FIG. 2 is a schematic diagram of an embodiment of an atmospheric pollution data analysis system provided herein;

Fig. 3 illustrates a schematic diagram of a computing device provided herein.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

In some of the flows described in the specification and claims of this application and in the foregoing figures, a number of operations are included that occur in a particular order, but it should be understood that the operations may be performed in other than the order in which they occur or in parallel, that the order of operations such as 101, 102, etc. is merely for distinguishing between the various operations, and that the order of execution is not by itself represented by any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

FIG. 1 is a flow chart illustrating an embodiment of a method for analyzing atmospheric pollution data provided herein, as shown in FIG. 1, the method comprising:

101. in the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;

in this step, the atmospheric pollution data refers to data related to atmospheric pollution, specifically including but not limited to: particulate matter, combustibles of coal, petroleum and other fuels, and the like.

Among the atmospheric unmanned data are included a number of factors related to air quality, optionally air temperature, air humidity, wind speed, atmospheric pressure, chemical concentration, and particle concentration, wherein chemical concentration includes but is not limited to sulfur dioxide, nitrogen oxides, carbon monoxide, and ozone.

In embodiments of the present application, sensor devices may be utilized to collect data regarding a plurality of factors associated with air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical concentration, and particulate concentration, by establishing an atmospheric pollution monitoring network in a city. This monitoring network may consist of a plurality of sensor devices distributed in different locations in the city, each device being responsible for monitoring a specific factor. These devices may transmit data to a central data processing center via wireless communication technology. At the data processing center, the collected data may be integrated and analyzed. By comprehensively analyzing the data of a plurality of factors, the air quality conditions of different areas of the city can be obtained.

The effect of this embodiment is that the air quality of the city can be monitored and evaluated in real time. By analyzing the data of the plurality of factors, the source and the degree of the atmospheric pollution can be known more accurately. Meanwhile, the data can also be used for formulating corresponding emission reduction measures and management policies so as to improve the air quality and protect the health of people.

For example, when the monitored data indicates that the particulate matter concentration and the chemical component concentration in a certain area exceed national standards, the government may take corresponding actions, such as limiting industrial emissions, enhancing vehicle exhaust emission control, etc., to reduce the amount of pollutant emissions. Thus, by monitoring and analyzing the data of a plurality of factors in real time, measures can be taken in time to improve and manage the air pollution problem.

Of course, the above is only one of possible implementation, and the application is not limited to analysis after a plurality of factors are acquired, but performs subsequent steps to enhance the accuracy of analysis of the atmospheric pollution data.

102. Taking the factors as first variables, and determining target variables according to the first variables;

in this step, the target variable is prediction data generated from a plurality of the first variables;

In the embodiment of the application, a plurality of factors (air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and particulate matter concentration) are taken as the first variables, and target variables related to the first variables are determined according to the first variables, wherein the target variables can include but are not limited to air quality indexes such as PM2.5 concentration and the like.

It will be appreciated that the target variable is not obtainable by environmental collection, but is determined synthetically by a plurality of factors collected from the environment to ensure accuracy of the target variable. Specifically, the application proposes a scheme that the contribution values of all the factors are compared, and it can be determined which factor has the greatest influence on the air pollution, for example, a larger relative contribution value indicates that the factor has a larger influence on the target variable. Specifically, the scheme is described in the following steps 103-107.

103. Establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;

in the embodiment of the application, p first variables are combined into a matrix x of p rows and 1 column according to the values of different sample points;

the control coefficient β is initialized, alternatively β may be initialized to a vector of all 0 or random values.

；

wherein,，/>the value of the first variable representing the i-th sample,，β _i control coefficient, beta, representing the ith first variable ₀ Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;

for example, x ₁ =air temperature, x ₂ =air humidity, x ₃ Wind speed, x ₄ =atmospheric pressure, x ₅ Concentration of chemical component, x ₆ =particulate matter concentration, then the expression of air pollution index y is as follows:

the control coefficient beta is expressed as the influence degree of the corresponding first variable on the target variable. Assuming that the regression coefficient of the air temperature is beta ₁ =0.5, which means that the average value of atmospheric pollution will increase by 0.5 units per 1 unit increase in air temperature. Similarly, if the regression coefficient of the particulate matter concentration is beta ₆ -0.3, which means that the average value of atmospheric pollution will increase by 0.3 units per 1 unit increase in particulate concentration.

As one possible implementation, the control coefficient may be determined by minimizing a sum of squares of residuals between the first variable and the target variable and the obtained statistics in a plurality of relationships between the first variable and the target variable.

Specifically, according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, calculating statistics, wherein the expression of the statistics is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T is represented as a statistic, beta is represented as the control coefficient, beta ₀ Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;

the calculation of the statistics may be used for hypothesis testing, where a null hypothesis is that the control coefficient is zero, i.e. the first variable has no significant effect on the target variable. By calculating the value of the statistic, a hypothesis test can be performed and a determination can be made as to whether the control coefficient is significant.

Further, the control coefficient is calculated by a preset estimation formula of the control coefficient, and the expression of the estimation of the preset control coefficient is as follows:；

In practice, it is assumed that an analysis of the relationship between the air pollution index and the air temperature, wind speed and particulate matter concentration (which may be some or all of the plurality of first variables) is required. After collecting atmospheric pollution data and corresponding values of air temperature, wind speed and particulate concentration over a period of time by means of sensors, the collected data are constructed into a matrix x and a vector y. The matrix x contains values for air temperature, wind speed and particulate concentration, and the vector y contains observations of the air pollution index, where observations are data points actually collected, representing actual observed atmospheric pollution data. The observations may comprise values of the target variable and the first variable.

In the embodiment of the present application, the control coefficient β is calculated using the above-mentioned preset estimation formula of the control coefficient, so as to obtain an estimated value of the control coefficient β. Further, a Standard Error (SE) is used to calculate the statistic T. The standard error represents the uncertainty of the control coefficient. Finally, the statistic T is used to check the significance of the control coefficient. If the value of statistic T is large, then the null hypothesis may be rejected, i.e., the effect of the first plurality of variables on the target variable is significant. Conversely, if the value of statistic T is small, then the null hypothesis cannot be rejected, i.e., the effect of the first variables on the target variable is insignificant.

Through the steps, the relation between the air pollution index and the air temperature, the wind speed and the particulate matter concentration can be analyzed, and whether the influence of each variable on the air pollution is obvious or not can be judged. This helps to better understand the mechanism of formation of atmospheric pollution and to formulate corresponding control measures and policies.

104. Determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables;

in the embodiment of the application, the correlation coefficient among a plurality of first variables is calculated through a calculation formula of a level correlation coefficient, wherein the value range of the level correlation coefficient is-1 to 1,1 represents complete positive correlation, -1 represents complete negative correlation, and 0 represents no correlation;

Specifically, the correlation coefficient among a plurality of first variables is calculated by a calculation formula of the level correlation coefficient, wherein the calculation formula of the level correlation coefficient is as follows:；

In practice, it is assumed that the differences in the ranking of the variables between the air pollution index and the air temperature, wind speed and particulate concentration are analyzed. After collecting the atmospheric pollution data and the corresponding values of the air temperature, the air speed and the particulate matter concentration within a period of time, the collected data are ranked to obtain the ranking of each variable, and specifically, the calculation formula of the rank correlation coefficient can be used for calculating the rank correlation coefficient ρ. By calculating (6 Σd) where Σd represents the sum of the level difference squares of the plurality of first variables, n represents the number of samples, and d represents the variable rank difference among the plurality of first variables)/(n (n-1)), the value of the level correlation coefficient ρ can be obtained.

Wherein, the value range of the level correlation coefficient is between-1 and 1. If the rank correlation coefficient is close to 1, it means that the rank difference of the plurality of first variables is small, i.e., their influence trends in atmospheric pollution are consistent. If the rank correlation coefficient is close to-1, it means that the ranking of the first variables is greatly different, i.e., their influence trends in atmospheric pollution are opposite. If the rank correlation coefficient is close to 0, it means that the ranking differences of the plurality of first variables are neutral, i.e., their influence tendency in the atmospheric pollution is not obvious.

Through the steps, the rank correlation coefficients among the first variables can be calculated, so that the ranking difference of the first variables in the atmospheric pollution can be known. This helps to determine which variables have a more pronounced effect on atmospheric pollution, and thus provide targeted environmental management and control.

For example, assume that there are two first variables x ₁ And x ₃ (e.g. x ₁ Representing air temperature x ₃ Representative wind speed as an example), andthere are 8 sample data collected. The ranking is shown in table 1 below:

TABLE 1

Sample numbering	x ₁ (air temperature)	x ₃ (wind speed)
			1	5	2
2	3	4
			3	1	1
4	4	3
			5	2	5
6	8	6
			7	7	8
8	6	7

First, it is necessary to pair x ₁ And x ₃ Ranking is performed. The ranking is determined according to the size of the first variable, the variables of the same value will get the same ranking, and the ranking values will be averaged accordingly, resulting in a ranked result as shown in table 2 below:

TABLE 2

Sample numbering	x ₁ (air temperature)	x ₃ (wind speed)	x ₁ Ranking	x ₃ Ranking
					1	5	2	4	3
2	3	4	3	5
					3	1	1	1	1
4	4	3	2	4
					5	2	5	1	6
6	8	6	7	7
					7	7	8	6	8
8	6	7	5	7

Further, a level difference sum of squares Σd is calculated:

Σd²= (4-3)² + (3-5)² +(1-1)² + (2-4)² + (1-6)² + (7-7)² +(6-8)² + (5-7)²

= 1 + 4 + 0 + 4 + 25 + 0 + 4 + 4

= 42

finally, the rank correlation coefficients are calculated using the formula:

≈ 1 - 252 / 504

≈ 1 - 0.5

= 0.5

thus, x is calculated ₁ And x ₃ The rank correlation coefficient of 0.5 indicates that there is a positive order relationship to some extent between them, i.e. indicates x ₁ And x ₃ The ranking differences of (2) are small, i.e. their trend of influence in atmospheric pollution is consistent.

105. And determining variable ranking differences among the plurality of first variables by calculating the level correlation coefficients among the plurality of first variables.

In the embodiment of the present application, specifically, step 105 may include:

1051. determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient;

in step 1051, the formula may be passed:determining a dependency index between a plurality of the first variables;

1052. Calculating a contribution value of each first variable according to the dependency index and the control coefficient;

in step 1052, a formula may be passedCalculating the contribution value of each first variable;

In a practical application, in an atmospheric pollution embodiment, the variable ranking differences of a plurality of first variables (such as air temperature, wind speed and particulate concentration) can be used to calculate the dependency index between them.

First, it is necessary to acquire the rank of each variable and calculate the variable rank difference d between them.

Then, the dependency index is calculated from the control coefficient β and the control function α. The control coefficient beta represents the influence degree of each first variable on atmospheric pollution, and the control function is used for adjusting the influence degree of the variable ranking difference on the dependence index.

The calculation formula is that。

In the calculation process, a suitable control coefficient beta and a control function alpha can be selected according to actual conditions. The control coefficient β can be estimated by, for example, a least square method. The control function α may be determined empirically or based on domain knowledge and is used to measure how much a variable ranking difference affects a dependency index.

By calculating the dependency index, the degree of dependency of the plurality of first variables on the atmospheric pollution can be evaluated. A higher dependency index indicates that the plurality of first variables have a stronger dependency on atmospheric pollution, while a lower dependency index indicates that the dependency between them is weaker.

Through the steps, the dependence index among a plurality of first variables can be calculated by utilizing the variable ranking differences and the control coefficients, so that the influence relationship of the first variables on the atmospheric pollution can be better understood. This helps to formulate corresponding environmental management and control strategies to reduce the extent of atmospheric pollution.

Further, in an atmospheric pollution embodiment, the contribution value of each of the first variables may be calculated using a dependency index and a control coefficient;

first, it is necessary to calculate the dependency index of each of the first variables, which can be obtained by the previously mentioned method.

Then, the formula can be used:calculating a contribution value for each of the first variables:

in the calculation process, each of the first variables is calculated firstAnd then add them to get the sum of the contribution values of all the first variables. Finally, the contribution value of each of the first variables may be calculated by dividing the contribution value of each of the first variables by the sum of the total contribution values;

By calculating the contribution value of each of said first variables, their relative importance in atmospheric pollution can be understood. A higher contribution value indicates that the variable contributes more to atmospheric pollution, while a lower contribution value indicates that it contributes less to atmospheric pollution.

Through the above steps, the contribution value of each of the first variables can be calculated using the dependence index and the control coefficient, thereby better understanding the degree of their influence on the atmospheric pollution. This helps to determine which variables are more important in order to take priority over corresponding environmental management and control measures.

106. And taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.

In this embodiment of the present application, as a possible implementation solution, the contribution values of a plurality of the first variables are ordered;

In the embodiment of the application, for the variable with high contribution value, the control measure can be enhanced so as to improve the effect of the target variable. This may include measures to increase resource investment, enhance monitoring and management, and the like. For high contribution variables, the manner in which they operate can be optimized to improve the performance of the target variable. This may include measures to improve the process flow, optimize the operating steps, etc. For high contribution variables, the relevant policies or decisions may be adjusted to better control the target variable. This may include measures to adjust marketing strategies, improve supply chain management, and the like. Reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.

As another possible implementation scheme, at least one first variable meeting a contribution value condition is screened from a plurality of first variables, the first variables are input into a pre-established prediction model to obtain an output result of the prediction model, corresponding measures are taken according to the output result to optimize a target variable, and the prediction model is obtained after training is performed on the plurality of first variables in advance. Establishing a predictive model is a conventional means in the art, and will not be described in detail herein.

Specifically, the first variable meeting the contribution value condition can be selected first: and selecting a first variable with higher contribution degree as input according to the contribution value evaluation result. The condition of the contribution value may be determined according to a pre-set threshold or otherwise.

Secondly, establishing a prediction model: the selected first variable is used as input, existing atmospheric pollution data is used as training set, and proper prediction model is used for training. Common predictive models include linear regression, support vector machines, random forests, and the like.

Further, obtaining an output result of the prediction model: and inputting the new first variable data into a pre-established prediction model to obtain a corresponding prediction output result. The predicted outcome may be an air quality indicator, such as PM2.5 concentration, at some point in the future.

Finally, taking corresponding measures according to the output result to optimize the target variable: based on the prediction, the decision maker can take corresponding measures to cope with the atmospheric pollution. For example, if the prediction shows that the PM2.5 concentration will exceed a safety threshold, a restriction on industrial emissions or a restriction on operation may be formulated to reduce the amount of pollutant emissions to reduce the extent of atmospheric pollution.

In practical application, it is assumed that the analysis results show that the air temperature, the air speed and the particle size concentration are variables with high contribution to the air pollution. These three variables were chosen as the first variable and trained using existing atmospheric pollution data.

And (3) establishing a prediction model, and training by taking the air temperature, the wind speed and the granularity concentration as input variables and taking the PM2.5 concentration as a target variable to obtain a trained prediction model.

When new air temperature, wind speed and particle concentration data exist, the data are input into a prediction model to obtain a prediction result, namely a PM2.5 concentration predicted value at a future time point.

According to the prediction result, if the predicted PM2.5 concentration exceeds the safety threshold, the decision maker can take corresponding measures, such as strengthening industrial emission control, limiting traffic flow or reminding citizens to take protective measures so as to reduce the influence of atmospheric pollution on human health.

Through the embodiment, the first variable meeting the requirements can be screened out according to the contribution value condition, and the prediction model is utilized for prediction so as to guide the formulation and implementation of the corresponding atmospheric pollution control measures.

Further, after the contribution values of the plurality of first variables are acquired, the analysis results (such as the contribution values and the like) are visually presented, so that a user can more intuitively understand and apply the analysis results. May be presented in the form of a chart, map, etc.

Specifically, in practical application, in order to visually display the analysis result of the air pollution data, a chart, a map and other modes can be used to intuitively present the relevant information.

The following is one specific example: data collection and preparation: data is collected for a number of factors related to air quality, such as air temperature, humidity, wind speed, atmospheric pressure, chemical concentration, and particle concentration, to ensure accuracy and integrity of the data. Data analysis and modeling: a relationship model is established between the plurality of first variables and the target variable. Variable ranking variance analysis: and calculating the level correlation coefficient among the first variables to determine the variable ranking differences. And (3) calculating a dependency index: a dependency index between the plurality of first variables is calculated based on the variable ranking differences and the control coefficients. Contribution evaluation and visual display: and calculating the contribution degree of each first variable according to the dependency index, and visually displaying the result.

The following are two common visualization approaches: the contribution degree of each first variable can be displayed in a polygonal form by using a radar chart. Each vertex represents a first variable, and the distance of the vertices represents the magnitude of the contribution. By comparing the contributions of the different variables, their relative importance in atmospheric pollution can be intuitively observed. Alternatively, the contribution degree of each first variable may be displayed in the form of a shade of color using thermodynamic diagrams. Darker colors indicate higher contributions and lighter colors indicate lower contributions. The contribution degree distribution condition of each variable can be clearly seen through thermodynamic diagrams, and a decision maker is assisted in making environment management and control strategies.

In addition, the analysis result can be combined with the geographic information data, and the spatial distribution condition of the atmospheric pollution can be displayed through a map. Geographic Information Systems (GIS) can be used for drawing thermodynamic diagrams, contour diagrams or dot diagrams and the like, and the atmospheric pollution degree of different areas can be visually displayed, so that decision makers can be helped to understand and solve the atmospheric pollution problem.

Through the data analysis and visual display, a decision maker and an environment manager can intuitively know the influence factors and the degree of the atmospheric pollution so as to formulate more effective environmental protection and pollution control measures.

Fig. 2 is a schematic structural diagram of an embodiment of an air pollution data analysis system provided in the present application, and as shown in fig. 2, the apparatus includes:

an acquisition module 21 for acquiring, in the air pollution data, a plurality of factors related to air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration, and particle size concentration;

a determining module 22, configured to take the multiple factors as first variables, and determine a target variable according to the multiple first variables, where the target variable is prediction data generated according to the multiple first variables;

An establishing module 23, configured to establish a relationship between the plurality of first variables and the target variable, where the relationship includes a control coefficient;

the determining module 22 is further configured to determine a variable ranking difference between the plurality of first variables by calculating a rank correlation coefficient between the plurality of first variables; determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;

a processing module 24, configured to take corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.

In this embodiment, optionally, the establishing module 23 of the device is specifically configured to form a first variable matrix from a plurality of the first variables, and establish a relationship between the plurality of the first variables and the target variable according to the first variable matrix and the target variable; in the relation between the first variables and the target variables, the control coefficient is calculated by minimizing the sum of squares of residual errors between the first variables and the target variables and the obtained statistic which is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained.

In this embodiment, optionally, the determining module 22 of the apparatus is specifically configured to determine a dependency index between a plurality of the first variables according to a variable ranking difference between the plurality of the first variables and the control coefficient; and calculating the contribution value of each first variable according to the dependency index and the control coefficient.

In this embodiment, optionally, the determining module 22 of the apparatus is specifically configured to calculate the correlation coefficient between the plurality of first variables by using a calculation formula of a level correlation coefficient, where a value range of the level correlation coefficient is-1 to 1,1 represents a complete positive correlation, -1 represents a complete negative correlation, and 0 represents no correlation; and determining variable ranking differences among the plurality of first variables based on correlation coefficients among the plurality of first variables.

In this embodiment, optionally, the processing module 24 of the apparatus is specifically configured to rank the contribution values of the plurality of first variables; for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain; for a first variable having a contribution value lower than a preset contribution, the corresponding measures taken include: reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.

In this embodiment, optionally, the building module 23 of the device is specifically configured to form a matrix x of p rows and 1 columns by using p first variables according to values of different sample points;

initializing a control coefficient;

wherein T is represented as a statistic, beta is represented as the control coefficient, beta ₀ Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;

In this embodiment, optionally, the determining module 22 of the device is specifically configured to calculate the correlation coefficient between the plurality of first variables by using a calculation formula of the level correlation coefficient, where the calculation formula of the level correlation coefficient is as follows:；

In this embodiment of the present application, optionally, the determining module 22 of the apparatus is specifically configured to use the formula:determining a dependency index between a plurality of the first variables; />

by the formula:calculating the contribution value of each first variable;

The air pollution data analysis system shown in fig. 2 may perform the air pollution data analysis method shown in the embodiment shown in fig. 1, and its implementation principle and technical effects are not repeated. The specific manner in which the respective modules, units, and operations of the atmospheric pollution data analysis device in the above embodiments are performed has been described in detail in the embodiments related to the method, and will not be described in detail herein.

In one possible design, the atmospheric pollution data analysis apparatus of the embodiment shown in FIG. 2 may be implemented as a computing device, which may include a storage component 301 and a processing component 302, as shown in FIG. 3;

the storage component 301 stores one or more computer instructions for execution by the processing component 302.

The processing component 302 is configured to: in the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration; taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables; establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients; determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables; determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient; calculating a contribution value of each first variable according to the dependency index and the control coefficient; and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.

Wherein the processing component 302 may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described above.

The storage component 301 is configured to store various types of data to support operations at the terminal. The memory component may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The display component 303 may be an Electroluminescent (EL) element, a liquid crystal display or a micro display having a similar structure, or a retina-directly displayable or similar laser scanning type display.

Of course, the computing device may necessarily include other components, such as input/output interfaces, communication components, and the like.

The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc.

The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.

The computing device may be a physical device or an elastic computing host provided by the cloud computing platform, and at this time, the computing device may be a cloud server, and the processing component, the storage component, and the like may be a base server resource rented or purchased from the cloud computing platform.

The embodiment of the application also provides a computer readable storage medium, and a computer program is stored, and when the computer program is executed by a computer, the method for analyzing the air pollution data in the embodiment shown in fig. 1 can be realized.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An atmospheric pollution data analysis method, comprising:

2. The method of claim 1, wherein the establishing a relationship between the plurality of first variables and the target variable, the relationship including a control coefficient, comprises:

3. The method of claim 2, wherein the determining the contribution value of each of the first variables based on the variable ranking differences between the plurality of first variables and the control coefficients comprises:

4. The method of claim 1, wherein determining the variable ranking differences between the plurality of first variables by calculating the rank correlation coefficients between the plurality of first variables comprises:

5. The method of claim 1, wherein taking corresponding measures to optimize a target variable based on the contribution values of the plurality of first variables comprises:

sorting the contribution values of the plurality of first variables;

6. The method of claim 2, wherein said grouping a plurality of said first variables into a first variable matrix and establishing a relationship between a plurality of said first variables and said target variable based on said first variable matrix and said target variable comprises:

initializing a control coefficient;

wherein (1)>，/>The value of the first variable representing the ith sample,/->，β _i Control coefficient, beta, representing the ith first variable ₀ Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;

7. The method of claim 4, wherein calculating the correlation coefficient between the plurality of first variables by the calculation formula of the rank correlation coefficient comprises:

calculating a correlation coefficient among a plurality of first variables by a calculation formula of a level correlation coefficient, wherein the calculation formula of the level correlation coefficient is as follows: ；

8. A method according to claim 3, wherein said determining a dependency index between a plurality of said first variables based on variable ranking differences between a plurality of said first variables and said control coefficients comprises:

by the formula:calculating the contribution value of each first variable;

9. An atmospheric pollution data analysis system, comprising:

10. A computer storage medium storing a computer program which, when executed by a computer, implements the method for analyzing atmospheric pollution data according to any one of claims 1 to 8.