CN117314023B - Atmospheric pollution data analysis method, system and computer storage medium - Google Patents
Atmospheric pollution data analysis method, system and computer storage medium Download PDFInfo
- Publication number
- CN117314023B CN117314023B CN202311606468.0A CN202311606468A CN117314023B CN 117314023 B CN117314023 B CN 117314023B CN 202311606468 A CN202311606468 A CN 202311606468A CN 117314023 B CN117314023 B CN 117314023B
- Authority
- CN
- China
- Prior art keywords
- variables
- variable
- control coefficient
- target
- contribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000007405 data analysis Methods 0.000 title claims abstract description 23
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 238000003915 air pollution Methods 0.000 claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims description 29
- 239000000126 substance Substances 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 7
- 238000012417 linear regression Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 15
- 230000007613 environmental effect Effects 0.000 abstract description 11
- 238000011156 evaluation Methods 0.000 abstract description 6
- 230000000875 corresponding effect Effects 0.000 description 31
- 238000004364 calculation method Methods 0.000 description 15
- 230000000694 effects Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000007726 management method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 239000013618 particulate matter Substances 0.000 description 8
- 238000012544 monitoring process Methods 0.000 description 7
- RAHZWNYVWXNFOC-UHFFFAOYSA-N Sulphur dioxide Chemical compound O=S=O RAHZWNYVWXNFOC-UHFFFAOYSA-N 0.000 description 6
- MWUXSHHQAYIFBG-UHFFFAOYSA-N nitrogen oxide Inorganic materials O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 description 6
- 239000002245 particle Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000011217 control strategy Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000003245 coal Substances 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000003344 environmental pollutant Substances 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 231100000719 pollutant Toxicity 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 1
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 description 1
- 238000003916 acid precipitation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002091 carbon monoxide Inorganic materials 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013068 supply chain management Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the application provides an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium. Wherein, in the air pollution data, a plurality of factors related to air quality are acquired; taking the factors as first variables, and determining target variables according to the first variables; establishing a plurality of relationships between the first variables and the target variables; determining variable ranking differences among a plurality of the first variables; calculating the contribution value of each first variable, and taking corresponding measures to optimize the target variable. According to the technical scheme, through the analysis and evaluation, more accurate analysis results of the atmospheric pollution data can be provided, and scientific basis is provided for environmental management and control decision, so that the degree of atmospheric pollution is reduced, and the air quality is improved.
Description
Technical Field
The embodiment of the application relates to the technical field of environmental science, in particular to an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium.
Background
Currently, atmospheric pollution has become one of the important environmental concerns of global concern. The atmospheric pollution is mainly caused by exhaust emission generated by activities such as industrial emission, traffic tail gas, coal burning and the like, and comprises harmful substances such as particulate matters, sulfur dioxide, nitrogen oxides and the like. Atmospheric pollution has serious effects on human health and ecological environment, such as respiratory diseases, acid rain, greenhouse effect, etc. Thus, methods and systems for accurately analyzing and assessing atmospheric pollution conditions become particularly important.
At present, the air pollution analysis mainly relies on sensor monitoring, utilizes an air quality sensor network to collect real-time air pollution data in cities, and evaluates the air quality by monitoring key indexes (such as PM2.5, PM10, SO2, NO2 and the like).
However, the data monitored by the sensor is affected by factors such as sensor accuracy, inaccurate calibration, etc., and certain errors and instabilities may exist.
Disclosure of Invention
The embodiment of the application provides an atmospheric pollution data analysis method, an atmospheric pollution data analysis system and a computer storage medium, which are used for solving the problem of low atmospheric pollution data analysis accuracy in the prior art.
In a first aspect, an embodiment of the present application provides an atmospheric pollution data analysis method, including:
In the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;
taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;
establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;
determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables;
determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;
and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.
Optionally, the establishing a relationship between the plurality of first variables and the target variable, where the relationship includes a control coefficient includes:
forming a first variable matrix by a plurality of first variables, and establishing a relation between the first variables and the target variables according to the first variable matrix and the target variables;
In the relation between the first variables and the target variables, the control coefficient is calculated by minimizing the sum of squares of residual errors between the first variables and the target variables and the obtained statistic which is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained.
Optionally, the determining the contribution value of each first variable based on the variable ranking differences among the plurality of first variables and the control coefficient includes:
determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient;
and calculating the contribution value of each first variable according to the dependency index and the control coefficient.
Optionally, the determining, by calculating the rank correlation coefficient between the plurality of first variables, a variable rank difference between the plurality of first variables includes:
calculating the correlation coefficient among a plurality of first variables through a calculation formula of a grade correlation coefficient, wherein the value range of the grade correlation coefficient is-1 to 1,1 represents complete positive correlation, -1 represents complete negative correlation, and 0 represents no correlation;
And determining variable ranking differences among the plurality of first variables based on correlation coefficients among the plurality of first variables.
Optionally, the taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables includes:
sorting the contribution values of the plurality of first variables;
for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain;
for a first variable having a contribution value lower than a preset contribution, the corresponding measures taken include: reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.
Optionally, the forming a plurality of the first variables into a first variable matrix, and establishing a relationship between the plurality of the first variables and the target variable according to the first variable matrix and the target variable includes:
forming a matrix x of p rows and 1 columns by p first variables according to the values of different sample points;
initializing a control coefficient;
establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:
wherein,,/>the value of the first variable representing the ith sample,/->,β i Control coefficient, beta, representing the ith first variable 0 Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;
the calculating the control coefficient by minimizing a sum of squares of residuals between the first variable and the target variable and the obtained statistics in the relationships between the plurality of first variables and the target variable includes:
calculating statistics according to the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, wherein the expression of the statistics is as follows:;
wherein T is represented as a statistic, beta is represented as the control coefficient, beta 0 Expressed as a theoretical value of the control coefficient under zero design, SE (β) is expressed as a standard error of the control coefficient;
calculating the control coefficient through a preset estimation formula of the control coefficient, wherein the expression of the estimation of the preset control coefficient is as follows:;
wherein, beta is expressed as the control coefficient, x is expressed as a first variable matrix composed of a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not.
Optionally, the calculating the correlation coefficient between the plurality of first variables by the calculation formula of the level correlation coefficient includes:
calculating the correlation coefficient among a plurality of first variables by a calculation formula of the grade correlation coefficient, wherein the calculation formula of the grade correlation coefficient is as follows:;
wherein,representing the rank correlation coefficient, ++>Represents the sum of the squares of the level differences of a plurality of first variables, n represents the number of samples, and d represents the variable ranking differences between a plurality of said first variables.
Optionally, the determining the dependency index between the first variables according to the variable ranking differences between the first variables and the control coefficients includes:
By the formula:determining a dependency index between a plurality of the first variables;
wherein β is denoted as a control coefficient, e is denoted as a natural constant, α is denoted as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, d is denoted as a variable ranking difference between a plurality of the first variables;
-said calculating a contribution value of each of said first variables from said dependency index and said control coefficient, comprising:
by the formula:calculating the contribution value of each first variable;
where β is denoted as the control coefficient and DependencyIndex is denoted as the dependency index between the plurality of first variables.
In a second aspect, embodiments of the present application provide an atmospheric pollution data analysis system, including:
an acquisition module for acquiring a plurality of factors related to air quality in the air pollution data, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;
the determining module is used for taking the factors as first variables and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;
The building module is used for building a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;
the determining module is further used for determining variable ranking differences among the plurality of first variables by calculating the level correlation coefficients among the plurality of first variables; determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;
and the processing module is used for taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.
The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In the embodiment of the application, a plurality of factors related to air quality are acquired from air pollution data, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration; taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables; establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients; determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables; determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient; calculating a contribution value of each first variable according to the dependency index and the control coefficient; and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables. The method for analyzing the atmospheric pollution data has the following beneficial effects:
Multiple factors consider: the method comprehensively considers a plurality of factors related to air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration. This helps to more fully understand the mechanism of formation of atmospheric pollution and the influencing factors.
Target variable prediction: by taking a plurality of first variables as inputs, prediction data is generated as target variables. This facilitates future air quality predictions and evaluations, thereby taking appropriate environmental management and control measures in advance.
And (3) establishing a relation: a relationship between the plurality of first variables and the target variable is established and includes a control coefficient. This helps to determine the extent to which each of the first variables affects the target variable and further analyzes the dependency relationship between them.
Variable ranking variance analysis: the variable ranking differences of the first variables are determined by calculating the rank correlation coefficients between the first variables. This helps to understand the relative importance of each variable in atmospheric pollution so that higher importance variables are prioritized for control.
And (3) calculating a dependency index: a dependency index between the first plurality of variables is determined based on the variable ranking differences and the control coefficients. This helps to quantify the extent to which each of the first variables contributes to atmospheric pollution, further guiding the formulation of environmental management and control strategies.
Contribution value evaluation: calculating the contribution value of each first variable according to the dependency index and the control coefficient; this helps to determine which variables have a high contribution to atmospheric pollution, so that corresponding measures can be taken in a targeted manner.
Through the analysis and evaluation, the method can provide more accurate analysis results of the atmospheric pollution data, and provide scientific basis for environmental management and control decision so as to reduce the degree of atmospheric pollution and improve the air quality.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 illustrates a flow chart of one embodiment of an atmospheric pollution data analysis method provided herein;
FIG. 2 is a schematic diagram of an embodiment of an atmospheric pollution data analysis system provided herein;
Fig. 3 illustrates a schematic diagram of a computing device provided herein.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.
In some of the flows described in the specification and claims of this application and in the foregoing figures, a number of operations are included that occur in a particular order, but it should be understood that the operations may be performed in other than the order in which they occur or in parallel, that the order of operations such as 101, 102, etc. is merely for distinguishing between the various operations, and that the order of execution is not by itself represented by any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
FIG. 1 is a flow chart illustrating an embodiment of a method for analyzing atmospheric pollution data provided herein, as shown in FIG. 1, the method comprising:
101. in the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;
in this step, the atmospheric pollution data refers to data related to atmospheric pollution, specifically including but not limited to: particulate matter, combustibles of coal, petroleum and other fuels, and the like.
Among the atmospheric unmanned data are included a number of factors related to air quality, optionally air temperature, air humidity, wind speed, atmospheric pressure, chemical concentration, and particle concentration, wherein chemical concentration includes but is not limited to sulfur dioxide, nitrogen oxides, carbon monoxide, and ozone.
In embodiments of the present application, sensor devices may be utilized to collect data regarding a plurality of factors associated with air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical concentration, and particulate concentration, by establishing an atmospheric pollution monitoring network in a city. This monitoring network may consist of a plurality of sensor devices distributed in different locations in the city, each device being responsible for monitoring a specific factor. These devices may transmit data to a central data processing center via wireless communication technology. At the data processing center, the collected data may be integrated and analyzed. By comprehensively analyzing the data of a plurality of factors, the air quality conditions of different areas of the city can be obtained.
The effect of this embodiment is that the air quality of the city can be monitored and evaluated in real time. By analyzing the data of the plurality of factors, the source and the degree of the atmospheric pollution can be known more accurately. Meanwhile, the data can also be used for formulating corresponding emission reduction measures and management policies so as to improve the air quality and protect the health of people.
For example, when the monitored data indicates that the particulate matter concentration and the chemical component concentration in a certain area exceed national standards, the government may take corresponding actions, such as limiting industrial emissions, enhancing vehicle exhaust emission control, etc., to reduce the amount of pollutant emissions. Thus, by monitoring and analyzing the data of a plurality of factors in real time, measures can be taken in time to improve and manage the air pollution problem.
Of course, the above is only one of possible implementation, and the application is not limited to analysis after a plurality of factors are acquired, but performs subsequent steps to enhance the accuracy of analysis of the atmospheric pollution data.
102. Taking the factors as first variables, and determining target variables according to the first variables;
in this step, the target variable is prediction data generated from a plurality of the first variables;
In the embodiment of the application, a plurality of factors (air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and particulate matter concentration) are taken as the first variables, and target variables related to the first variables are determined according to the first variables, wherein the target variables can include but are not limited to air quality indexes such as PM2.5 concentration and the like.
It will be appreciated that the target variable is not obtainable by environmental collection, but is determined synthetically by a plurality of factors collected from the environment to ensure accuracy of the target variable. Specifically, the application proposes a scheme that the contribution values of all the factors are compared, and it can be determined which factor has the greatest influence on the air pollution, for example, a larger relative contribution value indicates that the factor has a larger influence on the target variable. Specifically, the scheme is described in the following steps 103-107.
103. Establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients;
in the embodiment of the application, p first variables are combined into a matrix x of p rows and 1 column according to the values of different sample points;
the control coefficient β is initialized, alternatively β may be initialized to a vector of all 0 or random values.
Establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:
;
wherein,,/>the value of the first variable representing the ith sample,/->,β i Control coefficient, beta, representing the ith first variable 0 Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;
for example, x 1 =air temperature, x 2 =air humidity, x 3 Wind speed, x 4 =atmospheric pressure, x 5 Concentration of chemical component, x 6 =particulate matter concentration, then the expression of air pollution index y is as follows:
the control coefficient beta is expressed as the influence degree of the corresponding first variable on the target variable. Assuming that the regression coefficient of the air temperature is beta 1 =0.5, which means that the average value of atmospheric pollution will increase by 0.5 units per 1 unit increase in air temperature. Similarly, if the regression coefficient of the particulate matter concentration is beta 6 -0.3, which means that the average value of atmospheric pollution will increase by 0.3 units per 1 unit increase in particulate concentration.
As one possible implementation, the control coefficient may be determined by minimizing a sum of squares of residuals between the first variable and the target variable and the obtained statistics in a plurality of relationships between the first variable and the target variable.
Specifically, based on the control coefficient and the theoretical value of the control coefficient under the assumption of zeroAnd calculating statistics by using the standard error of the control coefficient, wherein the expression of the statistics is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T is represented as a statistic, beta is represented as the control coefficient, beta 0 Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;
the calculation of the statistics may be used for hypothesis testing, where a null hypothesis is that the control coefficient is zero, i.e. the first variable has no significant effect on the target variable. By calculating the value of the statistic, a hypothesis test can be performed and a determination can be made as to whether the control coefficient is significant.
Further, the control coefficient is calculated by a preset estimation formula of the control coefficient, and the expression of the estimation of the preset control coefficient is as follows:;
wherein, beta is expressed as the control coefficient, x is expressed as a first variable matrix composed of a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not.
In practice, it is assumed that an analysis of the relationship between the air pollution index and the air temperature, wind speed and particulate matter concentration (which may be some or all of the plurality of first variables) is required. After collecting atmospheric pollution data and corresponding values of air temperature, wind speed and particulate concentration over a period of time by means of sensors, the collected data are constructed into a matrix x and a vector y. The matrix x contains values for air temperature, wind speed and particulate concentration, and the vector y contains observations of the air pollution index, where observations are data points actually collected, representing actual observed atmospheric pollution data. The observations may comprise values of the target variable and the first variable.
In the embodiment of the present application, the control coefficient β is calculated using the above-mentioned preset estimation formula of the control coefficient, so as to obtain an estimated value of the control coefficient β. Further, a Standard Error (SE) is used to calculate the statistic T. The standard error represents the uncertainty of the control coefficient. Finally, the statistic T is used to check the significance of the control coefficient. If the value of statistic T is large, then the null hypothesis may be rejected, i.e., the effect of the first plurality of variables on the target variable is significant. Conversely, if the value of statistic T is small, then the null hypothesis cannot be rejected, i.e., the effect of the first variables on the target variable is insignificant.
Through the steps, the relation between the air pollution index and the air temperature, the wind speed and the particulate matter concentration can be analyzed, and whether the influence of each variable on the air pollution is obvious or not can be judged. This helps to better understand the mechanism of formation of atmospheric pollution and to formulate corresponding control measures and policies.
104. Determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables;
in the embodiment of the application, the correlation coefficient among a plurality of first variables is calculated through a calculation formula of a level correlation coefficient, wherein the value range of the level correlation coefficient is-1 to 1,1 represents complete positive correlation, -1 represents complete negative correlation, and 0 represents no correlation;
And determining variable ranking differences among the plurality of first variables based on correlation coefficients among the plurality of first variables.
Specifically, the correlation coefficient among a plurality of first variables is calculated by a calculation formula of the level correlation coefficient, wherein the calculation formula of the level correlation coefficient is as follows:;
wherein,representing the rank correlation coefficient, ++>A sum of squares of the level differences representing a plurality of first variables, n tableThe number of samples is shown, d representing the variable ranking difference between the plurality of first variables.
In practice, it is assumed that the differences in the ranking of the variables between the air pollution index and the air temperature, wind speed and particulate concentration are analyzed. After collecting the atmospheric pollution data and the corresponding values of the air temperature, the air speed and the particulate matter concentration within a period of time, the collected data are ranked to obtain the ranking of each variable, and specifically, the calculation formula of the rank correlation coefficient can be used for calculating the rank correlation coefficient ρ. By calculating (6 Σd) where Σd represents the sum of the level difference squares of the plurality of first variables, n represents the number of samples, and d represents the variable rank difference among the plurality of first variables)/(n (n-1)), the value of the level correlation coefficient ρ can be obtained.
Wherein, the value range of the level correlation coefficient is between-1 and 1. If the rank correlation coefficient is close to 1, it means that the rank difference of the plurality of first variables is small, i.e., their influence trends in atmospheric pollution are consistent. If the rank correlation coefficient is close to-1, it means that the ranking of the first variables is greatly different, i.e., their influence trends in atmospheric pollution are opposite. If the rank correlation coefficient is close to 0, it means that the ranking differences of the plurality of first variables are neutral, i.e., their influence tendency in the atmospheric pollution is not obvious.
Through the steps, the rank correlation coefficients among the first variables can be calculated, so that the ranking difference of the first variables in the atmospheric pollution can be known. This helps to determine which variables have a more pronounced effect on atmospheric pollution, and thus provide targeted environmental management and control.
For example, assume that there are two first variables x 1 And x 3 (e.g. x 1 Representing air temperature x 3 Representative of wind speed for example) and there are 8 sample data collected. The ranking is shown in table 1 below:
TABLE 1
Sample numbering | x 1 (air temperature) | x 3 (wind speed) |
1 | 5 | 2 |
2 | 3 | 4 |
3 | 1 | 1 |
4 | 4 | 3 |
5 | 2 | 5 |
6 | 8 | 6 |
7 | 7 | 8 |
8 | 6 | 7 |
First, it is necessary to pair x 1 And x 3 Ranking is performed. The ranking is determined according to the size of the first variable, the variables of the same value will get the same ranking, and the ranking values will be averaged accordingly, resulting in a ranked result as shown in table 2 below:
TABLE 2
Sample numbering | x 1 (air temperature) | x 3 (wind speed) | x 1 Ranking | x 3 Ranking |
1 | 5 | 2 | 4 | 3 |
2 | 3 | 4 | 3 | 5 |
3 | 1 | 1 | 1 | 1 |
4 | 4 | 3 | 2 | 4 |
5 | 2 | 5 | 1 | 6 |
6 | 8 | 6 | 7 | 7 |
7 | 7 | 8 | 6 | 8 |
8 | 6 | 7 | 5 | 7 |
Further, a level difference sum of squares Σd is calculated:
Σd²= (4-3)² + (3-5)² +(1-1)² + (2-4)² + (1-6)² + (7-7)² +(6-8)² + (5-7)²
= 1 + 4 + 0 + 4 + 25 + 0 + 4 + 4
= 42
finally, the rank correlation coefficients are calculated using the formula:
≈ 1 - 252 / 504
≈ 1 - 0.5
= 0.5
thus, x is calculated 1 And x 3 The rank correlation coefficient of 0.5 indicates that there is a positive order relationship to some extent between them, i.e. indicates x 1 And x 3 The ranking differences of (2) are small, i.e. their trend of influence in atmospheric pollution is consistent.
105. And determining variable ranking differences among the plurality of first variables by calculating the level correlation coefficients among the plurality of first variables.
In the embodiment of the present application, specifically, step 105 may include:
1051. determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient;
in step 1051, the formula may be passed:determining a dependency index between a plurality of the first variables;
wherein β is denoted as a control coefficient, e is denoted as a natural constant, α is denoted as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, d is denoted as a variable ranking difference between a plurality of the first variables;
1052. Calculating a contribution value of each first variable according to the dependency index and the control coefficient;
in step 1052, a formula may be passedCalculating the contribution value of each first variable;
where β is denoted as the control coefficient and DependencyIndex is denoted as the dependency index between the plurality of first variables.
In a practical application, in an atmospheric pollution embodiment, the variable ranking differences of a plurality of first variables (such as air temperature, wind speed and particulate concentration) can be used to calculate the dependency index between them.
First, it is necessary to acquire the rank of each variable and calculate the variable rank difference d between them.
Then, the dependency index is calculated from the control coefficient β and the control function α. The control coefficient beta represents the influence degree of each first variable on atmospheric pollution, and the control function is used for adjusting the influence degree of the variable ranking difference on the dependence index.
The calculation formula is that。
In the calculation process, a suitable control coefficient beta and a control function alpha can be selected according to actual conditions. The control coefficient β can be estimated by, for example, a least square method. The control function α may be determined empirically or based on domain knowledge and is used to measure how much a variable ranking difference affects a dependency index.
By calculating the dependency index, the degree of dependency of the plurality of first variables on the atmospheric pollution can be evaluated. A higher dependency index indicates that the plurality of first variables have a stronger dependency on atmospheric pollution, while a lower dependency index indicates that the dependency between them is weaker.
Through the steps, the dependence index among a plurality of first variables can be calculated by utilizing the variable ranking differences and the control coefficients, so that the influence relationship of the first variables on the atmospheric pollution can be better understood. This helps to formulate corresponding environmental management and control strategies to reduce the extent of atmospheric pollution.
Further, in an atmospheric pollution embodiment, the contribution value of each of the first variables may be calculated using a dependency index and a control coefficient;
first, it is necessary to calculate the dependency index of each of the first variables, which can be obtained by the previously mentioned method.
Then, the formula can be used:calculating a contribution value for each of the first variables:
in the calculation process, each of the first variables is calculated firstAnd then add them to get the sum of the contribution values of all the first variables. Finally, the contribution value of each of the first variables may be calculated by dividing the contribution value of each of the first variables by the sum of the total contribution values;
By calculating the contribution value of each of said first variables, their relative importance in atmospheric pollution can be understood. A higher contribution value indicates that the variable contributes more to atmospheric pollution, while a lower contribution value indicates that it contributes less to atmospheric pollution.
Through the above steps, the contribution value of each of the first variables can be calculated using the dependence index and the control coefficient, thereby better understanding the degree of their influence on the atmospheric pollution. This helps to determine which variables are more important in order to take priority over corresponding environmental management and control measures.
106. And taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.
In this embodiment of the present application, as a possible implementation solution, the contribution values of a plurality of the first variables are ordered;
for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain;
In the embodiment of the application, for the variable with high contribution value, the control measure can be enhanced so as to improve the effect of the target variable. This may include measures to increase resource investment, enhance monitoring and management, and the like. For high contribution variables, the manner in which they operate can be optimized to improve the performance of the target variable. This may include measures to improve the process flow, optimize the operating steps, etc. For high contribution variables, the relevant policies or decisions may be adjusted to better control the target variable. This may include measures to adjust marketing strategies, improve supply chain management, and the like. Reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.
As another possible implementation scheme, at least one first variable meeting a contribution value condition is screened from a plurality of first variables, the first variables are input into a pre-established prediction model to obtain an output result of the prediction model, corresponding measures are taken according to the output result to optimize a target variable, and the prediction model is obtained after training is performed on the plurality of first variables in advance. Establishing a predictive model is a conventional means in the art, and will not be described in detail herein.
Specifically, the first variable meeting the contribution value condition can be selected first: and selecting a first variable with higher contribution degree as input according to the contribution value evaluation result. The condition of the contribution value may be determined according to a pre-set threshold or otherwise.
Secondly, establishing a prediction model: the selected first variable is used as input, existing atmospheric pollution data is used as training set, and proper prediction model is used for training. Common predictive models include linear regression, support vector machines, random forests, and the like.
Further, obtaining an output result of the prediction model: and inputting the new first variable data into a pre-established prediction model to obtain a corresponding prediction output result. The predicted outcome may be an air quality indicator, such as PM2.5 concentration, at some point in the future.
Finally, taking corresponding measures according to the output result to optimize the target variable: based on the prediction, the decision maker can take corresponding measures to cope with the atmospheric pollution. For example, if the prediction shows that the PM2.5 concentration will exceed a safety threshold, a restriction on industrial emissions or a restriction on operation may be formulated to reduce the amount of pollutant emissions to reduce the extent of atmospheric pollution.
In practical application, it is assumed that the analysis results show that the air temperature, the air speed and the particle size concentration are variables with high contribution to the air pollution. These three variables were chosen as the first variable and trained using existing atmospheric pollution data.
And (3) establishing a prediction model, and training by taking the air temperature, the wind speed and the granularity concentration as input variables and taking the PM2.5 concentration as a target variable to obtain a trained prediction model.
When new air temperature, wind speed and particle concentration data exist, the data are input into a prediction model to obtain a prediction result, namely a PM2.5 concentration predicted value at a future time point.
According to the prediction result, if the predicted PM2.5 concentration exceeds the safety threshold, the decision maker can take corresponding measures, such as strengthening industrial emission control, limiting traffic flow or reminding citizens to take protective measures so as to reduce the influence of atmospheric pollution on human health.
Through the embodiment, the first variable meeting the requirements can be screened out according to the contribution value condition, and the prediction model is utilized for prediction so as to guide the formulation and implementation of the corresponding atmospheric pollution control measures.
Further, after the contribution values of the plurality of first variables are acquired, the analysis results (such as the contribution values and the like) are visually presented, so that a user can more intuitively understand and apply the analysis results. May be presented in the form of a chart, map, etc.
Specifically, in practical application, in order to visually display the analysis result of the air pollution data, a chart, a map and other modes can be used to intuitively present the relevant information.
The following is one specific example: data collection and preparation: data is collected for a number of factors related to air quality, such as air temperature, humidity, wind speed, atmospheric pressure, chemical concentration, and particle concentration, to ensure accuracy and integrity of the data. Data analysis and modeling: a relationship model is established between the plurality of first variables and the target variable. Variable ranking variance analysis: and calculating the level correlation coefficient among the first variables to determine the variable ranking differences. And (3) calculating a dependency index: a dependency index between the plurality of first variables is calculated based on the variable ranking differences and the control coefficients. Contribution evaluation and visual display: and calculating the contribution degree of each first variable according to the dependency index, and visually displaying the result.
The following are two common visualization approaches: the contribution degree of each first variable can be displayed in a polygonal form by using a radar chart. Each vertex represents a first variable, and the distance of the vertices represents the magnitude of the contribution. By comparing the contributions of the different variables, their relative importance in atmospheric pollution can be intuitively observed. Alternatively, the contribution degree of each first variable may be displayed in the form of a shade of color using thermodynamic diagrams. Darker colors indicate higher contributions and lighter colors indicate lower contributions. The contribution degree distribution condition of each variable can be clearly seen through thermodynamic diagrams, and a decision maker is assisted in making environment management and control strategies.
In addition, the analysis result can be combined with the geographic information data, and the spatial distribution condition of the atmospheric pollution can be displayed through a map. Geographic Information Systems (GIS) can be used for drawing thermodynamic diagrams, contour diagrams or dot diagrams and the like, and the atmospheric pollution degree of different areas can be visually displayed, so that decision makers can be helped to understand and solve the atmospheric pollution problem.
Through the data analysis and visual display, a decision maker and an environment manager can intuitively know the influence factors and the degree of the atmospheric pollution so as to formulate more effective environmental protection and pollution control measures.
Fig. 2 is a schematic structural diagram of an embodiment of an air pollution data analysis system provided in the present application, and as shown in fig. 2, the apparatus includes:
an acquisition module 21 for acquiring, in the air pollution data, a plurality of factors related to air quality, including air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration, and particle size concentration;
a determining module 22, configured to take the multiple factors as first variables, and determine a target variable according to the multiple first variables, where the target variable is prediction data generated according to the multiple first variables;
An establishing module 23, configured to establish a relationship between the plurality of first variables and the target variable, where the relationship includes a control coefficient;
the determining module 22 is further configured to determine a variable ranking difference between the plurality of first variables by calculating a rank correlation coefficient between the plurality of first variables; determining a contribution value of each first variable based on variable ranking differences among a plurality of first variables and the control coefficient;
a processing module 24, configured to take corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.
In this embodiment, optionally, the establishing module 23 of the device is specifically configured to form a first variable matrix from a plurality of the first variables, and establish a relationship between the plurality of the first variables and the target variable according to the first variable matrix and the target variable; in the relation between the first variables and the target variables, the control coefficient is calculated by minimizing the sum of squares of residual errors between the first variables and the target variables and the obtained statistic which is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained.
In this embodiment, optionally, the determining module 22 of the apparatus is specifically configured to determine a dependency index between a plurality of the first variables according to a variable ranking difference between the plurality of the first variables and the control coefficient; and calculating the contribution value of each first variable according to the dependency index and the control coefficient.
In this embodiment, optionally, the determining module 22 of the apparatus is specifically configured to calculate the correlation coefficient between the plurality of first variables by using a calculation formula of a level correlation coefficient, where a value range of the level correlation coefficient is-1 to 1,1 represents a complete positive correlation, -1 represents a complete negative correlation, and 0 represents no correlation; and determining variable ranking differences among the plurality of first variables based on correlation coefficients among the plurality of first variables.
In this embodiment, optionally, the processing module 24 of the apparatus is specifically configured to rank the contribution values of the plurality of first variables; for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain; for a first variable having a contribution value lower than a preset contribution, the corresponding measures taken include: reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.
In this embodiment, optionally, the building module 23 of the device is specifically configured to form a matrix x of p rows and 1 columns by using p first variables according to values of different sample points;
initializing a control coefficient;
establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:
wherein,,/>the value of the first variable representing the ith sample,/->,β i Control coefficient, beta, representing the ith first variable 0 Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;
calculating statistics according to the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, wherein the expression of the statistics is as follows:;
wherein T is represented as a statistic, beta is represented as the control coefficient, beta 0 Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;
calculating the control coefficient through a preset estimation formula of the control coefficient, wherein the expression of the estimation of the preset control coefficient is as follows:;
wherein, beta is expressed as the control coefficient, x is expressed as a first variable matrix composed of a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not.
In this embodiment, optionally, the determining module 22 of the apparatus is specifically configured to calculate the correlation coefficient between the plurality of first variables by using a calculation formula of the level correlation coefficient, whereThe calculation formula of the level correlation coefficient is as follows:;
wherein,representing the rank correlation coefficient, ++>Represents the sum of the squares of the level differences of a plurality of first variables, n represents the number of samples, and d represents the variable ranking differences between a plurality of said first variables.
In this embodiment of the present application, optionally, the determining module 22 of the apparatus is specifically configured to use the formula:determining a dependency index between a plurality of the first variables;
wherein β is denoted as a control coefficient, e is denoted as a natural constant, α is denoted as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, d is denoted as a variable ranking difference between a plurality of the first variables;
-said calculating a contribution value of each of said first variables from said dependency index and said control coefficient, comprising:
by the formula:calculating the contribution value of each first variable;
where β is denoted as the control coefficient and DependencyIndex is denoted as the dependency index between the plurality of first variables.
The air pollution data analysis system shown in fig. 2 may perform the air pollution data analysis method shown in the embodiment shown in fig. 1, and its implementation principle and technical effects are not repeated. The specific manner in which the respective modules, units, and operations of the atmospheric pollution data analysis device in the above embodiments are performed has been described in detail in the embodiments related to the method, and will not be described in detail herein.
In one possible design, the atmospheric pollution data analysis apparatus of the embodiment shown in FIG. 2 may be implemented as a computing device, which may include a storage component 301 and a processing component 302, as shown in FIG. 3;
the storage component 301 stores one or more computer instructions for execution by the processing component 302.
The processing component 302 is configured to: in the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration; taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables; establishing a plurality of relations between the first variables and the target variables, wherein the relations comprise control coefficients; determining variable ranking differences among a plurality of first variables by calculating level correlation coefficients among the plurality of first variables; determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient; calculating a contribution value of each first variable according to the dependency index and the control coefficient; and taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables.
Wherein the processing component 302 may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described above.
The storage component 301 is configured to store various types of data to support operations at the terminal. The memory component may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The display component 303 may be an Electroluminescent (EL) element, a liquid crystal display or a micro display having a similar structure, or a retina-directly displayable or similar laser scanning type display.
Of course, the computing device may necessarily include other components, such as input/output interfaces, communication components, and the like.
The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by the cloud computing platform, and at this time, the computing device may be a cloud server, and the processing component, the storage component, and the like may be a base server resource rented or purchased from the cloud computing platform.
The embodiment of the application also provides a computer readable storage medium, and a computer program is stored, and when the computer program is executed by a computer, the method for analyzing the air pollution data in the embodiment shown in fig. 1 can be realized.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (4)
1. An atmospheric pollution data analysis method, comprising:
in the air pollution data, a plurality of factors related to air quality are acquired, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;
taking the factors as first variables, and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;
establishing a relation between a plurality of first variables and the target variable, wherein the relation comprises a control coefficient, and the control coefficient is the influence degree of the corresponding first variable on the target variable;
Determining a contribution value of each first variable based on the obtained variable ranking differences among the plurality of first variables and the control coefficient, wherein the contribution value is a contribution value of the first variable to air pollution;
taking corresponding measures to optimize the target variable according to the contribution values of the first variables;
the establishing of a plurality of relationships between the first variables and the target variables, wherein the relationships comprise control coefficients and include:
forming a first variable matrix by a plurality of first variables, and establishing a relation between the first variables and the target variables according to the first variable matrix and the target variables;
calculating the control coefficient by minimizing the sum of squares of residuals between the first variable and the target variable and the obtained statistic in the relation between the first variable and the target variable, wherein the statistic is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained;
the composing a plurality of the first variables into a first variable matrix, and establishing a relation between the plurality of the first variables and the target variable according to the first variable matrix and the target variable, including:
Forming a matrix x of p rows and 1 columns by p first variables according to the values of different sample points;
initializing a control coefficient;
establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:
wherein,,x i the value of the first variable representing the ith sample,/->,β i Control coefficient, beta, representing the ith first variable 0 Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;
the calculating the control coefficient by minimizing a sum of squares of residuals between the first variable and the target variable and the obtained statistics in the relationships between the plurality of first variables and the target variable includes:
calculating statistics according to the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, wherein the expression of the statistics is as follows:;
wherein T is represented as a statistic, beta is represented as a plurality of control coefficients of the first variable, beta 0 Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;
calculating the control coefficient through a preset estimation formula of the control coefficient, wherein the expression of the estimation of the preset control coefficient is as follows: ;
Wherein, beta is expressed as a control coefficient of a plurality of first variables, x is expressed as a first variable matrix formed by a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not;
the determining a contribution value of each first variable based on the obtained variable ranking differences among the plurality of first variables and the control coefficient comprises the following steps:
determining a dependency index between a plurality of first variables according to variable ranking differences among the plurality of first variables and the control coefficient;
calculating a contribution value of each first variable according to the dependency index and the control coefficient;
wherein the determining a dependency index between the plurality of first variables according to the obtained variable ranking differences between the plurality of first variables and the control coefficient includes:
by the formula:determining a dependency index between a plurality of the first variables;
wherein β is represented as a control coefficient of a plurality of the first variables, e is represented as a natural constant, α is represented as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, and d is represented as a variable ranking difference between a plurality of the first variables;
Wherein said calculating a contribution value for each of said first variables from said dependent index and said control coefficient comprises:
by the formula:calculating the contribution value of each first variable;
wherein beta is i The control coefficient, denoted as the i-th first variable, is denoted as dependency index between a plurality of said first variables.
2. The method of claim 1, wherein taking corresponding measures to optimize a target variable based on the contribution values of the plurality of first variables comprises:
sorting the contribution values of the plurality of first variables;
for a first variable with a contribution value higher than a preset contribution, the corresponding measures taken include: enhancing control measures for the first variable with the contribution value higher than the preset contribution, wherein the control measures comprise resource investment and enhanced supervision; or, optimizing an operation mode of a first variable with a contribution value higher than a preset contribution, wherein the operation mode can comprise a process flow and an optimization step; or, adjusting a related strategy of a first variable with a contribution value higher than a preset contribution, wherein the related strategy comprises a marketing strategy and an improved supply chain;
for a first variable having a contribution value lower than a preset contribution, the corresponding measures taken include: reducing control measures for a first variable having a contribution value lower than a preset contribution; or, adjusting the priority of the first variable with the contribution value lower than the preset contribution; or, the first variable with the contribution value lower than the preset contribution is replaced by other first variables.
3. An atmospheric pollution data analysis system, comprising:
an acquisition module for acquiring a plurality of factors related to air quality in the air pollution data, wherein the factors comprise air temperature, air humidity, wind speed, atmospheric pressure, chemical component concentration and granularity concentration;
the determining module is used for taking the factors as first variables and determining target variables according to the first variables, wherein the target variables are prediction data generated according to the first variables;
the building module is used for building a relation between the first variables and the target variables, wherein the relation comprises control coefficients, and the control coefficients are the influence degree of the corresponding first variables on the target variables;
the determining module is further configured to determine a contribution value of each first variable based on a variable ranking difference among a plurality of first variables and the control coefficient, where the contribution value is a contribution value of the first variable to atmospheric pollution;
the processing module is used for taking corresponding measures to optimize the target variable according to the contribution values of the plurality of first variables;
the building module is specifically configured to form a first variable matrix from a plurality of first variables, and build a relationship between the plurality of first variables and the target variable according to the first variable matrix and the target variable; calculating the control coefficient by minimizing the sum of squares of residuals between the first variable and the target variable and the obtained statistic in the relation between the first variable and the target variable, wherein the statistic is determined according to the control coefficient, the theoretical value of the control coefficient under the assumption of zero obtained and the standard error of the control coefficient obtained;
The building module is specifically configured to form a matrix x of p rows and 1 column according to the values of p first variables at different sample points;
initializing a control coefficient;
establishing a relation between a first variable matrix X and a target variable Y, wherein the relation is as follows:
wherein,,/>the value of the first variable representing the ith sample,/->,β i Control coefficient, beta, representing the ith first variable 0 Is an intercept term in a linear regression model, y is denoted as the target variable, +.>Representing an error term;
calculating statistics according to the theoretical value of the control coefficient under the assumption of zero and the acquired standard error of the control coefficient, wherein the expression of the statistics is as follows:;
wherein T is represented as a statistic, beta is represented as a plurality of control coefficients of the first variable, beta 0 Expressed as a theoretical value of the control coefficient under the assumption of zero, SE (β) is expressed as a standard error of the control coefficient;
calculating the control coefficient through a preset estimation formula of the control coefficient, wherein the expression of the estimation of the preset control coefficient is as follows:;
wherein, beta is expressed as a control coefficient of a plurality of first variables, x is expressed as a first variable matrix formed by a plurality of first variables, y is expressed as a target variable, T is expressed as a statistic, and the statistic is used for judging whether the influence of the plurality of first variables on the target variable is obvious or not;
The determining module is specifically configured to determine a dependency index between the plurality of first variables according to a variable ranking difference between the plurality of first variables and the control coefficient; calculating a contribution value of each first variable according to the dependency index and the control coefficient;
the determining module is specifically configured to pass through the formula:determining a dependency index between a plurality of the first variables;
wherein β is represented as a control coefficient of a plurality of the first variables, e is represented as a natural constant, α is represented as a control function for adjusting the degree of influence of the variable ranking difference on the dependency index, and d is represented as a variable ranking difference between a plurality of the first variables;
the determining module is specifically configured to pass through the formula:calculating the contribution value of each first variable;
wherein beta is i The control coefficient, denoted as the i-th first variable, is denoted as dependency index between a plurality of said first variables.
4. A computer storage medium storing a computer program which, when executed by a computer, implements the atmospheric pollution data analysis method according to claim 1 or 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311606468.0A CN117314023B (en) | 2023-11-29 | 2023-11-29 | Atmospheric pollution data analysis method, system and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311606468.0A CN117314023B (en) | 2023-11-29 | 2023-11-29 | Atmospheric pollution data analysis method, system and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117314023A CN117314023A (en) | 2023-12-29 |
CN117314023B true CN117314023B (en) | 2024-02-20 |
Family
ID=89285042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311606468.0A Active CN117314023B (en) | 2023-11-29 | 2023-11-29 | Atmospheric pollution data analysis method, system and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117314023B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118538329A (en) * | 2024-07-26 | 2024-08-23 | 山东嘉源检测技术股份有限公司 | Outdoor air data processing system based on data analysis |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930670A (en) * | 2016-04-29 | 2016-09-07 | 浙江大学 | Model parameter uncertainty-based dynamic prediction method for river emergency pollution accident |
CN110059966A (en) * | 2019-04-23 | 2019-07-26 | 成都四方伟业软件股份有限公司 | The contribution analysis method and device of influence factor |
CN110428104A (en) * | 2019-08-01 | 2019-11-08 | 软通动力信息技术有限公司 | A kind of genes' contamination ratio determines method, apparatus, electronic equipment and storage medium |
CN111538957A (en) * | 2020-04-21 | 2020-08-14 | 中科三清科技有限公司 | Method, device, equipment and medium for acquiring contribution degree of atmospheric pollutant source |
CN113240203A (en) * | 2021-06-16 | 2021-08-10 | 生态环境部南京环境科学研究所 | Method for calculating pollution contribution rate of medium and small river channel sections of multiple pollution sources |
CN115453064A (en) * | 2022-09-22 | 2022-12-09 | 山东大学 | Fine particle air pollution cause analysis method and system |
-
2023
- 2023-11-29 CN CN202311606468.0A patent/CN117314023B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930670A (en) * | 2016-04-29 | 2016-09-07 | 浙江大学 | Model parameter uncertainty-based dynamic prediction method for river emergency pollution accident |
CN110059966A (en) * | 2019-04-23 | 2019-07-26 | 成都四方伟业软件股份有限公司 | The contribution analysis method and device of influence factor |
CN110428104A (en) * | 2019-08-01 | 2019-11-08 | 软通动力信息技术有限公司 | A kind of genes' contamination ratio determines method, apparatus, electronic equipment and storage medium |
CN111538957A (en) * | 2020-04-21 | 2020-08-14 | 中科三清科技有限公司 | Method, device, equipment and medium for acquiring contribution degree of atmospheric pollutant source |
CN113240203A (en) * | 2021-06-16 | 2021-08-10 | 生态环境部南京环境科学研究所 | Method for calculating pollution contribution rate of medium and small river channel sections of multiple pollution sources |
CN115453064A (en) * | 2022-09-22 | 2022-12-09 | 山东大学 | Fine particle air pollution cause analysis method and system |
Non-Patent Citations (2)
Title |
---|
农户秸秆资源化利用行为及其影响因素分析;张珺;石欣;;湖南农业大学学报(社会科学版);20200228(01);全文 * |
曹妃甸地区地下水水化学特征及影响因素的R型因子分析;张伟敬;孙晓明;柳富田;张卫;方成;;安全与环境工程;20100130(01);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117314023A (en) | 2023-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Harishkumar et al. | Forecasting air pollution particulate matter (PM2. 5) using machine learning regression models | |
Belis et al. | A new methodology to assess the performance and uncertainty of source apportionment models II: The results of two European intercomparison exercises | |
CN117314023B (en) | Atmospheric pollution data analysis method, system and computer storage medium | |
Durao et al. | Forecasting O3 levels in industrial area surroundings up to 24 h in advance, combining classification trees and MLP models | |
Fung et al. | Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration | |
CN114077970B (en) | Method and device for determining carbon emission related factor based on urban morphology | |
Hardini et al. | Predicting air quality index using ensemble machine learning | |
Radojević et al. | The significance of periodic parameters for ANN modeling of daily SO2 and NOx concentrations: A case study of Belgrade, Serbia | |
CN114912343A (en) | LSTM neural network-based air quality secondary prediction model construction method | |
CN109377440A (en) | A kind of PM based on multitask integrated study device2.5And O3Concentration collaborative forecasting method | |
Pernigotti et al. | DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis | |
CN115438848A (en) | PM based on deep mixed graph neural network 2.5 Long-term concentration prediction method | |
Nair et al. | Using machine learning to derive cloud condensation nuclei number concentrations from commonly available measurements | |
Jamal et al. | Predicting air quality index based on meteorological data: A comparison of regression analysis, artificial neural networks and decision tree | |
Braghiere et al. | Characterization of the radiative impact of aerosols on CO 2 and energy fluxes in the Amazon deforestation arch using artificial neural networks | |
CN115983329A (en) | Method, device, equipment and storage medium for predicting air quality and meteorological conditions | |
Sharma et al. | Forecasting and prediction of air pollutants concentrates using machine learning techniques: the case of India | |
Sonu et al. | Linear regression based air quality data analysis and prediction using python | |
Livingston et al. | An ensembled method for air quality monitoring and control using machine learning | |
Choudhary et al. | A Deep Learning approach to estimate Air Pollutants concentration levels in Delhi's Aerosphere | |
Baran | Air quality Index prediction in besiktas district by artificial neural networks and k nearest neighbors | |
CN111178756A (en) | Multiple linear regression fire risk assessment method based on environmental big data | |
Moral et al. | Mapping and hazard assessment of atmospheric pollution in a medium sized urban area using the Rasch model and geostatistics techniques | |
Srinivasa Raju et al. | Selection of global climate models | |
Kekulanadara et al. | Machine learning approach for predicting air quality index |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |