CN111985567A - Automatic pollution source type identification method based on machine learning - Google Patents
Automatic pollution source type identification method based on machine learning Download PDFInfo
- Publication number
- CN111985567A CN111985567A CN202010846058.3A CN202010846058A CN111985567A CN 111985567 A CN111985567 A CN 111985567A CN 202010846058 A CN202010846058 A CN 202010846058A CN 111985567 A CN111985567 A CN 111985567A
- Authority
- CN
- China
- Prior art keywords
- pollution
- feature
- data
- aqi
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000010801 machine learning Methods 0.000 title claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 52
- 238000012544 monitoring process Methods 0.000 claims abstract description 37
- 230000007613 environmental effect Effects 0.000 claims abstract description 11
- 230000002159 abnormal effect Effects 0.000 claims abstract description 10
- 238000004458 analytical method Methods 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000003344 environmental pollutant Substances 0.000 claims description 15
- 231100000719 pollutant Toxicity 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 13
- 238000007637 random forest analysis Methods 0.000 claims description 12
- 239000000356 contaminant Substances 0.000 claims description 11
- 239000000428 dust Substances 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000012706 support-vector machine Methods 0.000 claims description 7
- 230000010287 polarization Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 230000002093 peripheral effect Effects 0.000 claims description 4
- 239000000779 smoke Substances 0.000 claims description 4
- 239000013589 supplement Substances 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000013178 mathematical model Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 239000012855 volatile organic compound Substances 0.000 claims description 3
- 235000013361 beverage Nutrition 0.000 claims description 2
- 238000013145 classification model Methods 0.000 claims description 2
- 239000002131 composite material Substances 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 238000013499 data model Methods 0.000 claims description 2
- 230000001502 supplementing effect Effects 0.000 abstract description 3
- 238000011109 contamination Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 238000003066 decision tree Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000010865 sewage Substances 0.000 description 2
- 229910000831 Steel Inorganic materials 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000919 ceramic Substances 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01D—MEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
- G01D21/00—Measuring or testing not otherwise provided for
- G01D21/02—Measuring two or more variables by means not covered by a single other subclass
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/06—Investigating concentration of particle suspensions
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/0037—NOx
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/0039—O3
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/004—CO or CO2
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0027—General constructional details of gas analysers, e.g. portable test equipment concerning the detector
- G01N33/0036—General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
- G01N33/0042—SO2 or SO3
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0062—General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/20—Air quality improvement or preservation, e.g. vehicle emission control or emission reduction by using catalytic converters
Landscapes
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Combustion & Propulsion (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Tourism & Hospitality (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Human Resources & Organizations (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Molecular Biology (AREA)
- Economics (AREA)
- Biomedical Technology (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
Abstract
A pollution source type automatic identification method based on machine learning. Comprises the following steps: based on the environmental monitoring data, time and geographic information, identifying the occurrence of pollution problems and judging the type of a pollution source through analysis and judgment, and establishing a typical pollution case library; based on a machine learning algorithm, taking data of a case base as a sample to extract data characteristics, and developing a pollution source type recognition algorithm model; monitoring the real-time monitoring data by using the algorithm model, marking the abnormal data as a pollution event when the abnormal data is found, further identifying the type of a source causing pollution, realizing online identification of pollution source emission and automatically alarming; checking or on-site checking the model identification result according to the alarm information, processing the pollution problem if the model identification result exists really, and supplementing and listing the pollution problem in a typical case library for continuous optimization of an algorithm model; and if the identification result is not accurate, removing the pollution event mark. Based on monitoring data such as gridding micro stations and small stations, more data can be brought into a data source, and the model can be further optimized.
Description
Technical Field
The invention relates to the field of atmospheric environment monitoring, in particular to a pollution source type automatic identification method based on machine learning.
Background
In the field of atmospheric environment monitoring, a standard air station method is adopted in traditional monitoring, and due to the fact that cost is high, distribution quantity is small, generated data quantity is small, and the problem of fine pollution is difficult to accurately reflect. The micro-station adopting the sensor method can realize large-scale point distribution application due to low cost, SO that monitoring data with high space-time resolution in a monitoring area is obtained, monitoring parameters comprise PM10, PM2.5, SO2, NO2, CO, O3, temperature and humidity, the space resolution is up to 1 x 1km, and the time resolution is 1 h. The acquisition of massive environmental monitoring data supports the establishment of the corresponding relation between a pollution source and air quality, through manual analysis and research, the existing pollution problem can be found from data characteristics, and the source type of air pollution can be judged, including a dust raising source, a moving source, a coal-fired source, a catering oil smoke source, an industrial source and the like, so that the investigation range is reduced, the investigation accuracy is improved, the supervision efficiency is improved, and the manpower is saved for the on-site investigation work of the environmental problem.
However, the current problems are that the process of finding pollution problems and source types based on mass monitoring data requires a large amount of manpower and time, has high dependence on the technical level and experience of research personnel, has low efficiency of the whole application process, is poor in timeliness and is limited by the level of the technical personnel, and is difficult to effectively support environmental management. Therefore, a calculation method capable of efficiently, quickly and stably identifying the type of the pollution source is needed.
At present, the existing pollution source identification patent technology is based on hot spot grids instead of real-time monitoring data, for example, chinese patent CN110147383A, entitled "method and apparatus for determining pollution source type", and discloses a method for determining pollution source type, which determines pollution source type of the pollution grid by setting preset concentration value and preset concentration difference value, and combining wind speed, wind direction and pollution source situation in the grid; the invention of Chinese patent CN110006799A is named as 'a classification method of hotspot grid pollution types', and discloses a classification method of hotspot grid pollution types, which is used for classifying the atmosphere hotspot grid pollution types according to the change characteristics of the concentration of atmospheric pollutants along with time. The technology has the following disadvantages: firstly, the time and space resolution of hotspot grid data is low, so that pollution source identification work is mostly based on historical data, pollution tracing work cannot be guided in real time, and scientific and effective verification on an identification result is difficult; secondly, the satellite inversion data are restricted by meteorological conditions such as cloud cover, accuracy cannot be guaranteed, and effective tracing cannot be achieved; thirdly, the hot spot grid data reflects the air quality condition of the grid area rather than the periphery of the pollution source, so that the pollution source type is difficult to distinguish through the data characteristics; fourthly, the pollution source identification mode is single, and the characteristic parameters are few. And the types of the pollution sources at least comprise 6 types of pollution sources with different pollution characteristics. And the contamination characteristics described above cannot be accurately described.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for automatically identifying a type of a pollution source based on machine learning, so that the method can utilize parameters, time and space coordinate information of each pollutant, and the space information participates in a model operation for identifying a pollution process, that is, differences between target grid data and surrounding grid data are considered, rather than analyzing a data change trend in a time series.
In order to achieve the purpose, the invention provides a machine learning-based pollution source type automatic identification method, which mainly comprises the following steps:
step one, based on monitoring data such as PM10, PM2.5, SO2, NO2, CO, O3, temperature, humidity and the like, and time and geographic information, through (expert) analysis and judgment, the occurrence of pollution problems is identified, the type of a pollution source is judged, and a typical pollution case library is established.
Secondly, extracting data characteristics by taking mass data of the case base as samples based on a machine learning algorithm, and developing a pollution source type recognition algorithm model;
monitoring the real-time monitoring data by using the model, marking the abnormal data as a pollution event when the abnormal data is found, further identifying the type of a source causing pollution, realizing online identification of pollution source emission and automatically alarming;
fourthly, the expert examines or checks the model recognition result on site according to the alarm information, if the model recognition result exists, the pollution problem is processed, and event supplements are listed in a typical case library for continuous optimization of the algorithm model; and if the identification result is not accurate, removing the pollution event mark.
The identification algorithm adopted by the method is based on monitoring data such as PM10, PM2.5, SO2, NO2, CO, O3, temperature, humidity and the like, and time and geographic information, through analysis and judgment (by means of manual judgment of experts and the like), the occurrence of pollution problems is identified, the type of a pollution source is judged, and a typical pollution case library is established. Then, based on a machine learning algorithm, taking mass data of the case base as a sample to extract data characteristics, and developing a pollution source type identification algorithm model; and monitoring the real-time monitoring data by using the model, marking the abnormal data as a pollution event when the abnormal data is found, further identifying the type of a source causing pollution, realizing online identification of pollution source emission and automatically alarming. Furthermore, the model identification result can be audited or checked on site by virtue of experts according to alarm information, if the model identification result does exist, the pollution problem is treated, and event supplement is listed in a typical case library for continuous optimization of an algorithm model; and if the identification result is not accurate, removing the pollution event mark.
Preferably, the algorithm model training set contains pollution-free time series pollution data, after the pollution data of the grid is obtained, the proposed 38 features are calculated, and the classification result of the grid pollution type can be output by inputting the mathematical model after training.
The invention has the beneficial effects that by means of the technical scheme, the invention realizes the following advantages compared with the prior art:
(1) a data source: compared with the prior art based on hotspot grid data, the method is based on monitoring data such as grid micro stations and small stations, and can bring more data into a data source;
(2) an algorithm model: the technical scheme of the invention adopts a machine learning algorithm which specifically comprises algorithms such as a random forest, a neural network, a support vector machine, a gradient propeller and the like, and adopts a combined model which comprises sub models based on curve shape (time sequence shape) and deep neural network automatic feature extraction and the like;
(3) is characterized in that: in view of the fact that the selectable features based on features in the prior art are few (single grid judgment), through repeated research of the inventor, the algorithm of the invention can comprise 38 feature values in total, multi-point bit comparison judgment is realized, and data such as peripheral pollution sources and the like are further considered as the feature values; (can improve the accuracy of pollution type identification, and has the functions of distinguishing local sources and external sources, and the like, and overcomes the one-sidedness based on single grid analysis)
(4) Model continuous optimization: compared with the prior art which is based on historical data and has fixed algorithm, the technical scheme of the invention is that a generation of algorithm model is generated through the historical data, the application can be implemented in subsequent monitoring data and new cases can be found, the new cases are automatically put into a case library after being audited by technicians, and the model can be further optimized;
(5) compared with the prior art that the method is based on the client, the method can be based on the cloud server, and has the advantages that the cost of the client is reduced, the advantages of large data are formed at the server end, a large number of cases are collected at different places, the advantages of the technical scheme are fully played, and the accuracy of the algorithm judgment result is further improved.
Drawings
Fig. 1 is a flowchart illustrating steps of a method for automatically identifying a pollution source type based on machine learning according to the present invention.
Detailed Description
For a better understanding of the objects, aspects and advantages of the present invention, reference is made to the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
The hot spot grid in the invention refers to a technical unit related to the organization of the environmental protection department, and the Jingjin Ji and the surrounding key area of 2+26 cities are divided into a plurality of grids according to the length of 3km multiplied by 3 km. The method comprises the steps of integrating various data such as satellite remote sensing, air quality ground observation, meteorological observation and the like, utilizing a remote sensing image recognition technology based on cognition and multi-source data fusion, then determining the PM2.5 average concentration of each grid through atmospheric pollutant satellite remote sensing inversion, and determining key supervision areas in hot point grids according to concentration numerical sorting.
Referring to fig. 1, a flow chart of the recognition algorithm used in the present invention is shown, and its main concept content is briefly described as follows:
1. establishment of typical case base
The typical case base is a collection of cases describing pollution events and audited by human (experts) on the basis of environment monitoring data, and data information contained in each case at least comprises the following components: the starting time and the ending time of the pollution event, the name and the coordinates of the affected point, the type of the affected parameter and the weather conditions of the current place, and the type of the pollution source which is judged by experts. Wherein the parameter types may include PM10, PM2.5, SO2, NO2, CO, O3, and VOCs, the meteorological conditions include wind direction, wind speed, temperature, and humidity, and the pollution source types include dust sources, mobile sources, coal-fired sources, food and beverage oil smoke sources, industrial sources, and others.
2. Data characterization
The data features extracted from the algorithm model of the invention comprise: 1. first derivative standard deviation of PM 2.5; 2. first derivative standard deviation of CO; 3. SO (SO)2First derivative standard deviation of; 4. the first 10 first order differential series-squared sums for PM 2.5; 5. maximum value of CO; 6. a major contaminant; 7. skewness of AQI; 8. the 1st autocorrelation coefficient of PM 10; 9. quartiles of CO; 10. 1st autocorrelation coefficient of PM 2.5; 11. the coefficient of variation of AQI; 12. coefficient of variation of CO; 13. first derivative standard deviation of PM 10; 14. the first 10 first differential series sums of CO; 15. the sum of AQI; 16. SO (SO)2And is added to the CO sum; 17. skewness of PM 10; 18. o is2Maximum value of (d); 19. SO (SO)2The sum of (1); 20. median of CO; 21. the first 10 first order differential series sums of AQI; 22. NO2The kurtosis of (a); 23. the first 10 first differential order sums of squares for PM 10; 24. 1st autocorrelation coefficient of AQI; 25. a first differential stage of CO; 26. 1st autocorrelation coefficient of CO; 27. SO (SO)2The first differential order of; 28. the sum of CO; 29. SO (SO)2A median of (d); 30. kurtosis of PM 2.5; 31. a primary differential stage number of PM 2.5; 32. NO2The first 10 first order differential series sums of squares; 33. SO (SO)2The kurtosis of (a); 34. small value of AQI maximum time; 35. SO (SO)2Coefficient of variation of (a); 36. correlation coefficient of PM10 and CO; 37. SO (SO)2And CO correlation coefficient; 38. NO2And CO correlation coefficient.
The 38 characteristics can reflect the change situation of rising and falling of each pollutant and the (time cross) relevance of each pollutant time series to a certain extent, and comprehensively characterize the pollution types of each site in different periods from the statistical perspective.
For example, the feature 6(NO2_ diff1_ acf10) represents the degree of variation of the NO2 sequence, the feature 11(distance _ dtw) represents the similarity of time series between different pollutants, and the feature 17(co-quantile) represents the frequency distribution of C0 pollution, which can indicate to some extent whether a case belongs to automotive pollution.
However, due to the complexity of the multivariate time series variation and the correlation of multivariate time series of peripheral sites, it is difficult to artificially generalize and select the time series characteristics corresponding to each pollution type (or case). Therefore, the invention mainly combines the 38 weighted characteristics automatically based on the training data in the case base through a machine learning algorithm to generate a data-driven prediction model.
3. Model algorithm description/calculation formula
The technical scheme includes that a multi-label classification model is established for an existing case and a case supplemented later, namely, composite pollution formed by combining a plurality of pollution types possibly exists in the same time period and the same place, as shown in table 1 (all pollution types are not included), each row corresponds to one case or one pollution event, X is selected characteristic value summary, X1, X2, X3, X4, X5 and X6 are respectively characteristic values of corresponding cases, Y1, Y2, Y3, Y4 and Y5 are different pollution types and are called labels in the multi-label model, 1 represents that the type belongs to, and 0 represents that the type does not belong to. The model adopts a combination strategy, and the combination strategy mainly comprises Binary Relevance (Binary Relevance), Classifier Chains (Classifier Chains), Nested Stacking (Nested Stacking) and the like.
X | Y1 | Y2 | Y3 | Y4 | Y5 |
X1 | 1 | 0 | 0 | 0 | 0 |
X2 | 0 | 1 | 1 | 0 | 0 |
X3 | 0 | 0 | 0 | 1 | 0 |
X4 | 0 | 0 | 0 | 0 | 1 |
X5 | 0 | 1 | 0 | 0 | 0 |
X6 | 1 | 0 | 1 | 0 | 0 |
TABLE 1 Multi-tag model example
The invention mainly uses a binary association strategy, the principle of the strategy is to establish a binary classification for each label, the binary classification is simple and has/does not have a problem, namely whether the label belongs to the type or not, as shown in table 2, a model is divided into five binary classifications, then a plurality of binary classifications are combined together, each label is independently predicted during prediction, the dependency between the labels is not considered, then the result is combined into a multi-label target, the binary classification has linear computational complexity in the aspect of label quantity, and can be easily parallelized, namely the binary classification of each label is established at the same time, and the operation speed is improved. In addition, machine learning (e.g., random forest) models under default parameter configurations tend to ignore the less significant types of pollution in training samples in the prediction. In the algorithm, a cutoff value (cutoff) parameter in each two classifier is adjusted based on the proportion of each pollution type in a training sample, so that each pollution type can be predicted in a balanced manner by an optimized model, and the overall prediction performance is improved.
TABLE 2 binary Association policy example
When the binary classification of each label is established independently, the same machine learning algorithm is used for modeling of each binary classification under the default condition, and the algorithm comprises a random forest, a neural network, a support vector machine, a gradient propulsion machine and the like. After further learning and research, different characteristic value combinations can be combined when modeling of each pollution type is tried, different machine learning algorithms are tried, the optimal characteristic value combination and the optimal algorithm are selected to establish binary classification, finally, different binary classifications are combined and combined to form an optimal multi-label model according to binary association, and when a new pollution event is predicted, the pollution type can be comprehensively judged according to the characteristic value of the pollution event.
The invention constructs three algorithms of a support vector machine, a random forest and an XGboost for a model. Briefly introduced here, a Support Vector Machine (SVM) is a type of generalized linear classifier that performs binary classification on data in a supervised learning manner, and can be used for classification and regression. The random forest is an algorithm for integrating a plurality of trees through the idea of ensemble learning, belongs to a nonlinear classifier, and therefore, the complex nonlinear interdependence relation between variables can be mined. The basic unit of the random forest is a decision tree which is a basic classifier, the main work is to select features to divide a data set, and finally, the data is attached with two different types of labels, and the constructed decision tree is in a tree structure. The random forest can be obtained by constructing a plurality of decision trees, each tree gives a classification result when prediction is carried out, voting is carried out accordingly, and a final classification result is output by adopting a principle that majority obeys minority. XGBoost is also a decision tree based machine learning algorithm, different from random forests, where each decision tree is constructed separately, and the idea of XGBoost is to grow a tree by adding trees continuously and performing feature splitting continuously, and each time a tree is added, it is actually to learn a new function to fit the residual of the last prediction until a stopping condition is reached, such as the number of trees to be constructed. During prediction, according to the characteristics of a prediction sample, a corresponding leaf node is found on each tree, each leaf node corresponds to a score, and finally the scores corresponding to each tree are added together to obtain the prediction value of the sample.
When the model is constructed, because each sample of the pollution type is not necessarily balanced, which has certain influence on the accuracy of the model, the method optimizes the point when the model is constructed, avoids the influence caused by unbalanced samples to a certain extent by improving the parameters of the model, and can correspondingly adjust under the condition that the cases are continuously supplemented.
4. How to base on cloud server
In the development process of the algorithm model provided by the invention, as more available cases are provided, the prediction accuracy of the developed model is higher, so that environmental monitoring data of multiple cities are required; after the development is completed, the model can be applied to different cities. Therefore, in the scheme of the invention, the model is set to be in a cloud operation mode, and the operation mode can effectively utilize as much data as possible, improve the precision of the model and facilitate later wide application.
The following specific examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
In this embodiment, the method for automatically identifying the pollution source type based on machine learning of the present invention is to utilize a micro station to obtain monitoring data with high spatial and temporal resolution in a monitoring area, wherein the monitoring parameters include PM10, PM2.5, SO2, NO2, CO, O3, temperature, humidity, and propose classification based on concentration characteristics that change with time and geographic information. As shown in fig. 1, the method for automatically identifying the type of a pollution source based on machine learning provided by the present invention mainly includes the following steps:
monitoring data such as PM10, PM2.5, SO2, NO2, CO, O3, temperature, humidity and the like with high space-time resolution in a monitoring area, and time and geographic information are obtained through a micro station;
establishing a typical pollution case library based on expert judgment;
developing a pollution source type recognition algorithm model aiming at the pollution source emission data characteristics based on a machine learning algorithm;
carrying out abnormal data marking on the real-time monitoring data by using an algorithm model, identifying the type of a pollution source and automatically alarming;
then, the expert examines the model identification result according to the alarm information to determine whether the model identification result is accurate; if the identification is correct, processing the pollution source, and supplementing the event into a case library to further optimize the algorithm model; the contamination event flag is de-flagged if an error is identified.
In the following embodiments, the classification of the high spatial and temporal resolution site pollution types in the monitored area comprises the following steps:
1. PM10, PM2.5, SO2, NO2, CO, O3, temperature, humidity and other monitoring data with high space-time resolution of a monitoring area, and time and geographic information are obtained through the micro-station.
Because different pollution types have different characteristics on the change of the pollutant concentration, various characteristics are extracted from time series pollution data according to basic statistics of the data; and then converting some geographic information, emission list information and information acquired by expert judgment into corresponding characteristic variables, such as: and (3) the characteristics of pollution sources around the stations, road network density around the stations and time series distance, and the total number is 140.
The characteristics and some of the calculations involved for each contaminant are as follows:
the 6 pollutants (PM10, PM2.5, SO2, NO2, O3, CO) and AQI were formed in case groups:
diff1_ acf 10: the first 10 first order difference series sums of squares;
diff1_ acf 1: a first differential stage number;
x _ acf 1: a first autocorrelation coefficient;
x _ pacf 5: the sum of squares of the autocorrelation coefficients of the first five parts;
diff2x _ pacf 5: the first 5 2 differential series sums of squares;
std1st _ der: first derivative standard deviation;
the average value, the sum, the maximum value, the quartile, the variation coefficient, the mean, the standard deviation, the median, the variance, the skewness, the kurtosis and the hour value of the maximum time of the AQI are formed by grouping the 6 pollutants and the AQI according to cases; correlation coefficients between six contaminants and AQI; the main contaminants.
Pollution sources around the station: according to the pollution source information and the emission list information around the stations, acquiring the number of different types of pollution sources around different stations and taking the pollution sources as characteristic values;
site peripheral road network density: considering the influence of motor vehicle emission on pollutant data, according to the situation of the road network around the site, the density of the road network around the site is obtained by using a geographic information system technology and is used as a characteristic value;
time series distance features: similarity of time series between contaminants, Dynamic Time Warping (DTW) distance is used.
Then screening a certain amount of characteristic variables from all considered variables according to the importance of the variables in the random forest model, and finally selecting the following 38 data characteristics based on the pollution data and the geographic information, the emission list information and the information obtained by expert judgment as the basis of pollution type classification.
The method is characterized in that: co _ stdlst _ der; first derivative standard deviation of CO;
and (2) feature: pm10_ diff1_ acf 10; the first 10 first differential order sums of squares for PM 10;
and (3) feature: pm2 — 5_ diff1_ acf 10; the first 10 first order differential series-squared sums for PM 2.5;
and (4) feature: co _ diff1_ acf 10; the first 10 first differential series sums of CO;
and (5) feature: polarization; the positions of the sites of the pollution cases judged by the experts, such as main roads, sensitive points, towns, construction sites, environmental background points and the like;
and (6) feature: no2_ diff1_ acf 10; the first 10 first order differential series-squared sums of NO 2;
and (7) feature: aqi _ diff1_ acf 10; the first 10 first order differential series sums of AQI;
and (2) characteristic 8: x _ acf1_ aqi; a first autocorrelation coefficient of AQI;
and (2) characteristic 9: aqi _ cv; the coefficient of variation of AQI;
the characteristics are as follows: data; AQI maximum time small value;
and (2) characteristic 11: distance _ dtw; similarity in time series between contaminants, using dtw distance;
and (2) feature 12: aqi _ sum; the sum of AQI;
and (2) characteristic 13: pm10_ stdlst _ der; first derivative standard deviation of PM 10;
feature 14: pm2_5_ stdlst _ der; first derivative standard deviation of PM 2.5;
and (2) feature 15: so2_ stdlst _ der; first derivative standard deviation of SO 2;
and (4) feature 16: co _ max; maximum value of CO;
and (2) feature 17: co _ quantile; quartiles of CO;
feature 18: so2_ co _ sum; sum of SO2 plus sum of CO;
and (2) feature 19: so2_ max; maximum value of SO 2;
and (2) feature 20: co _ sum; the sum of CO;
characteristic 21: so2_ sum; the sum of SO 2;
and (2) feature 22: x _ acf1_ pm 10; a first autocorrelation coefficient of PM 10;
and (4) feature 23: x _ acf1_ co; a first autocorrelation coefficient of CO;
feature 24: x _ acf1_ pm2_ 5; first autocorrelation coefficient of PM 2.5;
and (2) feature 25: co _ cv; coefficient of variation of CO;
feature 26: so2_ cv; the coefficient of variation of SO 2;
characteristic 27: so2_ mean; the median of SO 2;
characteristic 28: co _ mean; median of CO;
characteristic 29: pm2 — 5_ diff1_ acf 1; a primary differential stage number of PM 2.5;
and (2) feature 30: so2_ diff1_ acf 1; a first differential stage of SO 2;
feature 31: co _ diff1_ acf 1; a first differential stage of CO;
feature 32: skewness _ pm 10; skewness of PM 10;
feature 33: skewness _ aqi; skewness of AQI;
feature 34: pm2 — 5_ kurtosis; kurtosis of PM 2.5;
characteristic 35: so2_ kurtosis; kurtosis of SO 2;
feature 36: no2_ kurtosis; kurtosis of NO 2;
feature 37: polarization _ entities; acquiring the number of different types of pollution sources around the site according to the pollution source information around the site;
feature 38: polarization _ type; and obtaining the number of different types of pollution sources around the station according to the emission list.
2. And establishing a typical pollution case library based on expert judgment.
The type of contamination of each high spatial-temporal resolution grid may be determined by expert judgment based on the contamination data and some other information, and in this embodiment the determined types of contamination include: raise dust and dust; a motor vehicle; heavy vehicles, machinery, ships; catering oil smoke; burning coal; carrying out unorganized incineration; an enterprise; fireworks and crackers; the procedures involving VOCs are 9 types.
3. And developing a pollution source type identification algorithm model aiming at the pollution source emission data characteristics based on a machine learning algorithm.
And calculating 38 technical characteristics selected by the invention according to the pollution data and other information, and labeling the characteristic data corresponding to each grid according to the pollution type judged by experts to be used as training data of the model. The method adopts a machine learning algorithm, specifically comprises a random forest, a neural network, a support vector machine, a gradient propeller and the like, and adopts a combined model, and includes sub-models based on curve shape (time sequence shape) and deep neural network automatic feature extraction and the like to train a data model, so that the proposed dimensionality and feature classification can be better understood, and the accuracy of pollution type classification can be improved.
4. And (4) carrying out abnormal data marking on the real-time monitoring data by using an algorithm model, identifying the type of a pollution source and automatically alarming.
The algorithm model training set contains pollution-free time sequence pollution data, after the pollution data of the grid are obtained, 38 proposed technical features are calculated, and the classification result of the grid pollution type can be output by inputting the mathematical model after training. In the early-stage test, two standard air stations and three micro stations (southeast corner of a certain steel enterprise, a certain sewage treatment plant and a city north loop) in a certain city are randomly selected, data after 2019, 9 and 1 days are selected, a segmentation function is adopted to divide the data into different segments, then the pollution segments are screened by using different pollutant concentration conditions, each segment is predicted by using a model established by a case to obtain the pollution types of the different segments, then a series of information of the obtained site pollution segments is sent back to an expert, and the expert performs secondary judgment.
5. The expert examines the model identification result according to the alarm information and determines whether the model identification result is accurate; if the identification is correct, processing the pollution source, and supplementing the event into a case library to further optimize the algorithm model; the contamination event flag is de-flagged if an error is identified. For example, the pollution type of the site Tangshan ceramics 2019/9/714: 00-2019/9/87: 00 in the time period is recognized as a (flying dust and dust), the expert group performs secondary judgment, the judgment type is a (flying dust and dust), the result obtained by the model is matched with the judgment result of the expert, and the case can be used as case supplement to be input into a case base and a pollution source is processed; and the pollution type of the suburb sewage treatment plant 842, 2019/9/212:00-2019/9/2111:00 at the station is identified as g (enterprise), the expert group has no obvious pollution source when carrying out secondary judgment, the expert audits the model identification result according to the alarm information to be different from the model identification result, and the pollution event mark is removed at the moment.
It will be appreciated by those skilled in the art that the model of the present invention will have an increasing accuracy of model identification as contamination events are replenished into the case library.
Although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the invention.
Claims (10)
1. A pollution source type automatic identification method based on machine learning is characterized by comprising the following steps:
the method comprises the following steps of firstly, identifying the occurrence of pollution problems and judging the type of a pollution source through analysis and judgment based on environmental monitoring data, time and geographic information, and establishing a typical pollution case library;
secondly, extracting data characteristics by taking mass data of the case base as samples based on a machine learning algorithm, and developing a pollution source type recognition algorithm model;
monitoring real-time monitoring data by using the algorithm model, marking abnormal data as a pollution event if the abnormal data are found, further identifying the type of a source causing pollution, realizing online identification of pollution source emission and automatically alarming;
checking or checking the model identification result on site according to the alarm information, if the model identification result exists really, processing the pollution problem, and adding event supplements into a typical case library for continuous optimization of the algorithm model; and if the identification result is not accurate, removing the pollution event mark.
2. The method for automatically identifying the type of the pollution source based on the machine learning as claimed in claim 1, wherein:
the environmental monitoring data includes: PM10, PM2.5, SO2, NO2, CO, O3, temperature, and humidity, as well as time and geographic information;
the typical case base is a set of cases which are used for describing pollution events and are audited on the basis of the environmental monitoring data, and each case comprises the following data information: the starting time and the ending time of the pollution event, the name and the coordinates of the affected point, the type of the affected parameter and the meteorological conditions of the current place, and the type of the pollution source which is judged by an expert;
the affected parameters are parameters for obtaining high spatial and temporal resolution of the monitored area through a micro-station, and the parameter types at least comprise 6 pollutants: PM10, PM2.5, SO2, NO2, CO, O3, and VOCs, meteorological conditions including wind direction, wind speed, temperature, and humidity, pollution stain types including dust sources, mobile sources, coal-fired sources, food and beverage oil smoke sources, and industrial sources.
3. The automatic identification method for the pollution source type based on the machine learning according to the claim 1 or 2, characterized in that in the step one, various characteristics are extracted from the time series pollution data according to the basic statistics of the data; and converting the geographic information, the emission list information and the information acquired by expert judgment into corresponding characteristic variables.
4. The method for automatically identifying the type of the pollution source based on the machine learning as claimed in claim 3, wherein the extracted features and the calculation method are as follows:
the 6 pollutants and AQI are formed according to case grouping:
diff1_ acf 10: the first 10 first order difference series sums of squares;
diff1_ acf 1: a first differential stage number;
x _ acf 1: a first autocorrelation coefficient;
x _ pacf 5: the sum of squares of the autocorrelation coefficients of the first five parts;
diff2x _ pacf 5: the first 5 2 differential series sums of squares;
std1st _ der: first derivative standard deviation;
the average value, the sum, the maximum value, the quartile, the variation coefficient, the mean, the standard deviation, the median, the variance, the skewness, the kurtosis and the hour value of the maximum time of the AQI are formed by grouping the 6 pollutants and the AQI according to cases; correlation coefficients between six contaminants and AQI; a major contaminant;
pollution sources around the station: according to the pollution source information and the emission list information around the stations, acquiring the number of different types of pollution sources around different stations and taking the pollution sources as characteristic values;
site peripheral road network density: considering the influence of motor vehicle emission on pollutant data, according to the situation of the road network around the site, the density of the road network around the site is obtained by using a geographic information system technology and is used as a characteristic value;
time series distance features: similarity of time series between contaminants, Dynamic Time Warping (DTW) distance is used.
5. The method of claim 4, wherein a certain amount of characteristic variables are selected from all considered variables according to the importance of the variables in the random forest model, and the following 38 data characteristics based on pollution data and geographic information, emission list information and information obtained by expert judgment are selected as the basis for pollution type classification,
the method is characterized in that: co _ stdlst _ der; first derivative standard deviation of CO;
and (2) feature: pm10_ diff1_ acf 10; the first 10 first differential order sums of squares for PM 10;
and (3) feature: pm2 — 5_ diff1_ acf 10; the first 10 first order differential series-squared sums for PM 2.5;
and (4) feature: co _ diff1_ acf 10; the first 10 first differential series sums of CO;
and (5) feature: polarization; the positions of the sites of the pollution cases judged by the experts, such as main roads, sensitive points, towns, construction sites, environmental background points and the like;
and (6) feature: no2_ diff1_ acf 10; the first 10 first order differential series-squared sums of NO 2;
and (7) feature: aqi _ diff1_ acf 10; the first 10 first order differential series sums of AQI;
and (2) characteristic 8: x _ acf1_ aqi; a first autocorrelation coefficient of AQI;
and (2) characteristic 9: aqi _ cv; the coefficient of variation of AQI;
the characteristics are as follows: data; AQI maximum time small value;
and (2) characteristic 11: distance _ dtw; similarity in time series between contaminants, using dtw distance;
and (2) feature 12: aqi _ sum; the sum of AQI;
and (2) characteristic 13: pm10_ stdlst _ der; first derivative standard deviation of PM 10;
feature 14: pm2_5_ stdlst _ der; first derivative standard deviation of PM 2.5;
and (2) feature 15: so2_ stdlst _ der; first derivative standard deviation of SO 2;
and (4) feature 16: co _ max; maximum value of CO;
and (2) feature 17: co _ quantile; quartiles of CO;
feature 18: so2_ co _ sum; sum of SO2 plus sum of CO;
and (2) feature 19: so2_ max; maximum value of SO 2;
and (2) feature 20: co _ sum; the sum of CO;
characteristic 21: so2_ sum; the sum of SO 2;
and (2) feature 22: x _ acf1_ pm 10; a first autocorrelation coefficient of PM 10;
and (4) feature 23: x _ acf1_ co; a first autocorrelation coefficient of CO;
feature 24: x _ acf1_ pm2_ 5; first autocorrelation coefficient of PM 2.5;
and (2) feature 25: co _ cv; coefficient of variation of CO;
feature 26: so2_ cv; the coefficient of variation of SO 2;
characteristic 27: so2_ mean; the median of SO 2;
characteristic 28: co _ mean; median of CO;
characteristic 29: pm2 — 5_ diff1_ acf 1; a primary differential stage number of PM 2.5;
and (2) feature 30: so2_ diff1_ acf 1; a first differential stage of SO 2;
feature 31: co _ diff1_ acf 1; a first differential stage of CO;
feature 32: skewness _ pm 10; skewness of PM 10;
feature 33: skewness _ aqi; skewness of AQI;
feature 34: pm2 — 5_ kurtosis; kurtosis of PM 2.5;
characteristic 35: so2_ kurtosis; kurtosis of SO 2;
feature 36: no2_ kurtosis; kurtosis of NO 2;
feature 37: polarization _ entities; acquiring the number of different types of pollution sources around the site according to the pollution source information around the site;
feature 38: polarization _ type; obtaining the number of different types of pollution sources around the station according to the discharge list;
the data features extracted from the algorithm model comprise: 1. first derivative standard deviation of PM 2.5; 2. first derivative standard deviation of CO; 3. SO (SO)2First derivative standard deviation of; 4. the first 10 first order differential series-squared sums for PM 2.5; 5. maximum value of CO; 6. a major contaminant; 7. skewness of AQI; 8. the 1st autocorrelation coefficient of PM 10; 9. quartiles of CO; 10. 1st autocorrelation coefficient of PM 2.5; 11. the coefficient of variation of AQI; 12. coefficient of variation of CO; 13. first derivative scaling of PM10Tolerance; 14. the first 10 first differential series sums of CO; 15. the sum of AQI; 16. SO (SO)2And is added to the CO sum; 17. skewness of PM 10; 18. o is2Maximum value of (d); 19. SO (SO)2The sum of (1); 20. median of CO; 21. the first 10 first order differential series sums of AQI; 22. NO2The kurtosis of (a); 23. the first 10 first differential order sums of squares for PM 10; 24. 1st autocorrelation coefficient of AQI; 25. a first differential stage of CO; 26. 1st autocorrelation coefficient of CO; 27. SO (SO)2The first differential order of; 28. the sum of CO; 29. SO (SO)2A median of (d); 30. kurtosis of PM 2.5; 31. a primary differential stage number of PM 2.5; 32. NO2The first 10 first order differential series sums of squares; 33. SO (SO)2The kurtosis of (a); 34. small value of AQI maximum time; 35. SO (SO)2Coefficient of variation of (a); 36. correlation coefficient of PM10 and CO; 37. SO (SO)2And CO correlation coefficient; 38. NO2And CO correlation coefficient.
6. The method for automatically identifying the type of the pollution source based on the machine learning as claimed in claim 1, wherein: the pollution source type identification algorithm model is a multi-label classification model established for the existing cases and the cases supplemented later, namely, the pollution source type identification algorithm model can express composite pollution formed by combining a plurality of pollution types possibly existing in the same place in the same time period; and adopting a combination strategy to dye the pollution source type identification algorithm model.
7. The method for automatically identifying the type of the pollution source based on the machine learning as claimed in claim 6, wherein: the combination strategy is binary association, a classifier chain or nested superposition; according to the proportion of each pollution type in the training data, a cutoff value (cutoff) parameter is set in each classifier so as to solve the problem of non-equilibrium of training samples and improve the prediction accuracy of accidental pollution types.
8. The method for automatically identifying the type of the pollution source based on the machine learning as claimed in claim 7, wherein: the combination strategy is a binary association strategy, a binary classification is established for each label, the binary classification is a simple problem, namely whether the label belongs to the type or not, a model is divided into a plurality of binary classifications, then the binary classifications are combined together, each label is independently predicted during prediction, the dependency between the labels is not considered, then the result is combined into a multi-label target, the binary classification has linear calculation complexity in the aspect of label quantity so as to be easily parallelized, namely the binary classification of each label is established at the same time, and the operation speed is improved.
9. The method according to claim 8, wherein the selected 38 features are calculated according to the pollution data and other information, and the feature data corresponding to each grid is labeled according to the judged pollution type to serve as training data of the model; the method adopts a machine learning algorithm, specifically comprises a random forest, a neural network, a support vector machine, a gradient propeller and the like, and adopts a combined model, and includes sub-models based on curve shape (time sequence shape) and deep neural network automatic feature extraction and the like to train a data model, so that the proposed dimensionality and feature classification can be better understood, and the accuracy of pollution type classification can be improved.
10. The method of claim 1, wherein an algorithm model training set contains pollution-free time series pollution data, after the pollution data of the grid is obtained, the proposed 38 features are calculated, and the classification result of the grid pollution type can be output by inputting the mathematical model after the training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010846058.3A CN111985567B (en) | 2020-08-21 | 2020-08-21 | Automatic pollution source type identification method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010846058.3A CN111985567B (en) | 2020-08-21 | 2020-08-21 | Automatic pollution source type identification method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111985567A true CN111985567A (en) | 2020-11-24 |
CN111985567B CN111985567B (en) | 2022-11-22 |
Family
ID=73443859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010846058.3A Active CN111985567B (en) | 2020-08-21 | 2020-08-21 | Automatic pollution source type identification method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985567B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634113A (en) * | 2020-12-22 | 2021-04-09 | 山西大学 | Polluted waste gas correlation analysis method based on dynamic sliding window |
CN112990024A (en) * | 2021-03-18 | 2021-06-18 | 深圳博沃智慧科技有限公司 | Method for monitoring urban raise dust |
CN113295635A (en) * | 2021-05-27 | 2021-08-24 | 河北先河环保科技股份有限公司 | Water pollution alarm method based on dynamic update data set |
CN113688940A (en) * | 2021-09-09 | 2021-11-23 | 浙江大学 | Suspected pollution industrial enterprise identification method based on public data |
CN113706127A (en) * | 2021-10-22 | 2021-11-26 | 长视科技股份有限公司 | Water area analysis report generation method and electronic equipment |
CN114332540A (en) * | 2021-12-31 | 2022-04-12 | 北京建筑大学 | Building automation system data marking method and system based on big data |
CN114693003A (en) * | 2022-05-23 | 2022-07-01 | 成都秦川物联网科技股份有限公司 | Smart city air quality prediction method and system based on Internet of things |
CN115018348A (en) * | 2022-06-20 | 2022-09-06 | 北京北投生态环境有限公司 | Environment analysis method, system, equipment and storage medium based on artificial intelligence |
CN115358718A (en) * | 2022-08-24 | 2022-11-18 | 广东旭诚科技有限公司 | Noise pollution classification and real-time supervision method based on intelligent monitoring front end |
CN115792919A (en) * | 2023-01-19 | 2023-03-14 | 合肥中科光博量子科技有限公司 | Method for identifying pollution hot spot area through horizontal scanning and monitoring of aerosol laser radar |
CN116912069A (en) * | 2023-09-13 | 2023-10-20 | 成都市智慧蓉城研究院有限公司 | Data processing method applied to smart city and electronic equipment |
CN117057819A (en) * | 2023-08-15 | 2023-11-14 | 泰华智慧产业集团股份有限公司 | Rainwater pipe network sewage discharge traceability analysis method and system |
US20230419823A1 (en) * | 2022-06-28 | 2023-12-28 | Chengdu Qinchuan Iot Technology Co., Ltd. | Methods and systems for managing exhaust emission in a smart city based on industrial internet of things |
CN117473398A (en) * | 2023-12-26 | 2024-01-30 | 四川国蓝中天环境科技集团有限公司 | Urban dust pollution source classification method based on slag transport vehicle activity |
CN117633661A (en) * | 2024-01-26 | 2024-03-01 | 西南交通大学 | Slag car high-risk pollution source classification method based on evolution diagram self-supervised learning |
RU2818685C1 (en) * | 2023-06-19 | 2024-05-03 | федеральное государственное автономное образовательное учреждение высшего образования "Национальный исследовательский университет "Высшая школа экономики" | Method of identifying a source of emission of harmful substances into the atmosphere based on artificial intelligence technology |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103234883A (en) * | 2013-04-30 | 2013-08-07 | 中南大学 | Road traffic flow-based method for estimating central city PM2.5 in real time |
CN104899596A (en) * | 2015-03-16 | 2015-09-09 | 景德镇陶瓷学院 | Multi-label classification method and apparatus thereof |
CN106844626A (en) * | 2017-01-20 | 2017-06-13 | 武汉大学 | Using microblogging keyword and the method and system of positional information simulated air quality |
CN107608009A (en) * | 2017-09-15 | 2018-01-19 | 深圳市卡普瑞环境科技有限公司 | A kind of air quality surveillance equipment, processing terminal and server |
CN108764013A (en) * | 2018-03-28 | 2018-11-06 | 中国科学院软件研究所 | A kind of automatic Communication Signals Recognition based on end-to-end convolutional neural networks |
CN109740560A (en) * | 2019-01-11 | 2019-05-10 | 济南浪潮高新科技投资发展有限公司 | Human cellular protein automatic identifying method and system based on convolutional neural networks |
CN110006799A (en) * | 2019-02-14 | 2019-07-12 | 北京市环境保护监测中心 | A kind of classification method of hot spot grid pollution type |
CN110186820A (en) * | 2018-12-19 | 2019-08-30 | 河北中科遥感信息技术有限公司 | Multisource data fusion and environomental pollution source and pollutant distribution analysis method |
CN110870019A (en) * | 2017-10-16 | 2020-03-06 | 因美纳有限公司 | Semi-supervised learning for training deep convolutional neural network sets |
CN111121862A (en) * | 2019-09-29 | 2020-05-08 | 广西中遥空间信息技术有限公司 | Air-space-ground integrated atmospheric environment monitoring system and method |
CN111461184A (en) * | 2020-03-19 | 2020-07-28 | 南京理工大学 | XGB multi-dimensional operation and maintenance data anomaly detection method based on multivariate feature matrix |
-
2020
- 2020-08-21 CN CN202010846058.3A patent/CN111985567B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103234883A (en) * | 2013-04-30 | 2013-08-07 | 中南大学 | Road traffic flow-based method for estimating central city PM2.5 in real time |
CN104899596A (en) * | 2015-03-16 | 2015-09-09 | 景德镇陶瓷学院 | Multi-label classification method and apparatus thereof |
CN106844626A (en) * | 2017-01-20 | 2017-06-13 | 武汉大学 | Using microblogging keyword and the method and system of positional information simulated air quality |
CN107608009A (en) * | 2017-09-15 | 2018-01-19 | 深圳市卡普瑞环境科技有限公司 | A kind of air quality surveillance equipment, processing terminal and server |
CN110870019A (en) * | 2017-10-16 | 2020-03-06 | 因美纳有限公司 | Semi-supervised learning for training deep convolutional neural network sets |
CN108764013A (en) * | 2018-03-28 | 2018-11-06 | 中国科学院软件研究所 | A kind of automatic Communication Signals Recognition based on end-to-end convolutional neural networks |
CN110186820A (en) * | 2018-12-19 | 2019-08-30 | 河北中科遥感信息技术有限公司 | Multisource data fusion and environomental pollution source and pollutant distribution analysis method |
CN109740560A (en) * | 2019-01-11 | 2019-05-10 | 济南浪潮高新科技投资发展有限公司 | Human cellular protein automatic identifying method and system based on convolutional neural networks |
CN110006799A (en) * | 2019-02-14 | 2019-07-12 | 北京市环境保护监测中心 | A kind of classification method of hot spot grid pollution type |
CN111121862A (en) * | 2019-09-29 | 2020-05-08 | 广西中遥空间信息技术有限公司 | Air-space-ground integrated atmospheric environment monitoring system and method |
CN111461184A (en) * | 2020-03-19 | 2020-07-28 | 南京理工大学 | XGB multi-dimensional operation and maintenance data anomaly detection method based on multivariate feature matrix |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634113B (en) * | 2020-12-22 | 2023-09-26 | 山西大学 | Pollution waste gas correlation analysis method based on dynamic sliding window |
CN112634113A (en) * | 2020-12-22 | 2021-04-09 | 山西大学 | Polluted waste gas correlation analysis method based on dynamic sliding window |
CN112990024A (en) * | 2021-03-18 | 2021-06-18 | 深圳博沃智慧科技有限公司 | Method for monitoring urban raise dust |
CN112990024B (en) * | 2021-03-18 | 2024-03-26 | 深圳博沃智慧科技有限公司 | Urban dust monitoring method |
CN113295635A (en) * | 2021-05-27 | 2021-08-24 | 河北先河环保科技股份有限公司 | Water pollution alarm method based on dynamic update data set |
CN113688940A (en) * | 2021-09-09 | 2021-11-23 | 浙江大学 | Suspected pollution industrial enterprise identification method based on public data |
CN113706127A (en) * | 2021-10-22 | 2021-11-26 | 长视科技股份有限公司 | Water area analysis report generation method and electronic equipment |
CN114332540A (en) * | 2021-12-31 | 2022-04-12 | 北京建筑大学 | Building automation system data marking method and system based on big data |
CN114332540B (en) * | 2021-12-31 | 2024-10-29 | 北京建筑大学 | Big data-based building automation system data marking method and system |
CN114693003A (en) * | 2022-05-23 | 2022-07-01 | 成都秦川物联网科技股份有限公司 | Smart city air quality prediction method and system based on Internet of things |
US11776081B1 (en) * | 2022-05-23 | 2023-10-03 | Chengdu Qinchuan Iot Technology Co., Ltd. | Methods and systems for predicting air quality in smart cities based on an internet of things |
US20230394611A1 (en) * | 2022-05-23 | 2023-12-07 | Chengdu Qinchuan Iot Technology Co., Ltd. | Method and system for area management in smart city based on internet of things |
US12056782B2 (en) | 2022-05-23 | 2024-08-06 | Chengdu Qinchuan Iot Technology Co., Ltd. | Method and system for area management in smart city based on internet of things |
CN115018348A (en) * | 2022-06-20 | 2022-09-06 | 北京北投生态环境有限公司 | Environment analysis method, system, equipment and storage medium based on artificial intelligence |
US20230419823A1 (en) * | 2022-06-28 | 2023-12-28 | Chengdu Qinchuan Iot Technology Co., Ltd. | Methods and systems for managing exhaust emission in a smart city based on industrial internet of things |
CN115358718A (en) * | 2022-08-24 | 2022-11-18 | 广东旭诚科技有限公司 | Noise pollution classification and real-time supervision method based on intelligent monitoring front end |
CN115792919A (en) * | 2023-01-19 | 2023-03-14 | 合肥中科光博量子科技有限公司 | Method for identifying pollution hot spot area through horizontal scanning and monitoring of aerosol laser radar |
RU2818685C1 (en) * | 2023-06-19 | 2024-05-03 | федеральное государственное автономное образовательное учреждение высшего образования "Национальный исследовательский университет "Высшая школа экономики" | Method of identifying a source of emission of harmful substances into the atmosphere based on artificial intelligence technology |
CN117057819A (en) * | 2023-08-15 | 2023-11-14 | 泰华智慧产业集团股份有限公司 | Rainwater pipe network sewage discharge traceability analysis method and system |
CN116912069A (en) * | 2023-09-13 | 2023-10-20 | 成都市智慧蓉城研究院有限公司 | Data processing method applied to smart city and electronic equipment |
CN116912069B (en) * | 2023-09-13 | 2024-01-02 | 成都市智慧蓉城研究院有限公司 | Data processing method applied to smart city and electronic equipment |
CN117473398B (en) * | 2023-12-26 | 2024-03-19 | 四川国蓝中天环境科技集团有限公司 | Urban dust pollution source classification method based on slag transport vehicle activity |
CN117473398A (en) * | 2023-12-26 | 2024-01-30 | 四川国蓝中天环境科技集团有限公司 | Urban dust pollution source classification method based on slag transport vehicle activity |
CN117633661B (en) * | 2024-01-26 | 2024-04-02 | 西南交通大学 | Slag car high-risk pollution source classification method based on evolution diagram self-supervised learning |
CN117633661A (en) * | 2024-01-26 | 2024-03-01 | 西南交通大学 | Slag car high-risk pollution source classification method based on evolution diagram self-supervised learning |
Also Published As
Publication number | Publication date |
---|---|
CN111985567B (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111985567B (en) | Automatic pollution source type identification method based on machine learning | |
CN115578015B (en) | Sewage treatment whole process supervision method, system and storage medium based on Internet of things | |
Kleine Deters et al. | Modeling PM2. 5 urban pollution using machine learning and selected meteorological parameters | |
CN116186566B (en) | Diffusion prediction method and system based on deep learning | |
CN110288001B (en) | Target recognition method based on target data feature training learning | |
CN108595414B (en) | Soil heavy metal enterprise pollution source identification method based on source-sink space variable reasoning | |
CN112307884A (en) | Forest fire spreading prediction method based on continuous time sequence remote sensing situation data and electronic equipment | |
CN116359218B (en) | Industrial aggregation area atmospheric pollution mobile monitoring system | |
Van et al. | A new model of air quality prediction using lightweight machine learning | |
CN111008337A (en) | Deep attention rumor identification method and device based on ternary characteristics | |
CN115438848A (en) | PM based on deep mixed graph neural network 2.5 Long-term concentration prediction method | |
KR102564191B1 (en) | Disaster response system that detects and responds to disaster situations in real time | |
Al_Janabi et al. | Pragmatic method based on intelligent big data analytics to prediction air pollution | |
CN113935228A (en) | L-band rough sea surface radiation brightness and temperature simulation method based on machine learning | |
CN115761439A (en) | Boiler inner wall sink detection and identification method based on target detection | |
CN115146537A (en) | Atmospheric pollutant emission estimation model construction method and system based on power consumption | |
Kim et al. | Massive scale deep learning for detecting extreme climate events | |
CN109213840B (en) | Hot spot grid identification method based on multidimensional feature deep learning | |
CN113267601B (en) | Industrial production environment remote real-time monitoring cloud platform based on machine vision and data analysis | |
CN114527235A (en) | Real-time quantitative detection method for emission intensity | |
Senior-Williams et al. | The Classification of Tropical Storm Systems in Infrared Geostationary Weather Satellite Images Using Transfer Learning | |
CN110543675A (en) | Power transmission line fault identification method | |
KR20230167856A (en) | Visibility Prediction Method using Tree-based Machine Learning Algorithm and Meteorological Forecasting Data | |
Srijiranon et al. | Investigation of PM10 prediction utilizing data mining techniques: Analyze by topic | |
CN113935394A (en) | Apparatus and method for environmental monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |