CN111427877A - Environmental protection abnormal data fine screening method based on cluster analysis - Google Patents

Environmental protection abnormal data fine screening method based on cluster analysis Download PDF

Info

Publication number
CN111427877A
CN111427877A CN202010199335.6A CN202010199335A CN111427877A CN 111427877 A CN111427877 A CN 111427877A CN 202010199335 A CN202010199335 A CN 202010199335A CN 111427877 A CN111427877 A CN 111427877A
Authority
CN
China
Prior art keywords
data
points
screened
database
quantitative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010199335.6A
Other languages
Chinese (zh)
Inventor
王尧
肖彦
柯安
江伟
胡远
杨肃博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Gangli Environmental Protection Co ltd
Original Assignee
Chongqing Gangli Environmental Protection Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Gangli Environmental Protection Co ltd filed Critical Chongqing Gangli Environmental Protection Co ltd
Priority to CN202010199335.6A priority Critical patent/CN111427877A/en
Publication of CN111427877A publication Critical patent/CN111427877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The invention provides an environmental protection abnormal data fine screening method based on cluster analysis, which comprises the following steps: s1, acquiring data to be screened, and forming a database to be screened by the acquired data to be screened; s2, screening quantitative data from the database to be screened, and forming the screened quantitative data into a quantitative database; and S3, screening abnormal data from the quantitative database, and removing the screened abnormal data from the quantitative database. The method can be used for finely screening the data collected in the early stage before the tobacco tar data are processed, eliminating abnormal data points in the obtained data and avoiding the interference caused by the abnormal data.

Description

Environmental protection abnormal data fine screening method based on cluster analysis
Technical Field
The invention relates to the technical field of environmental protection data screening, in particular to an environmental protection abnormal data fine screening method based on cluster analysis.
Background
Chinese food culture is a feature of Chinese civilization, and has a long source. The Chinese cuisine refers to regional cuisine with unique flavor formed by long history evolution and a whole set of cooking skills of an integrated system in a certain area due to different climates, geography, history, physical products and dietary customs, and currently, eight cuisines which are most influential and representative and are known today are developed, namely: sichuan (Sichuan), Shandong (Lucai), Guangdong (Yue-Meng), Jiangsu (Sucai), Zhejiang (Zhejiang), Fujian (Mincai), Anhui (Anhui) and Hunan (Hunan). The eight major cuisine has various styles, but the cooking modes include 'frying', which is the most frequently used cooking mode of Chinese catering service units and is also the main source mode of oil smoke generation.
Atmospheric pollutants generated in the catering industry are discharged into the environment in the form of oil fume, and can be generally divided into two types of particulate matters and gas matters according to the forms of the atmospheric pollutants. Wherein, the oil smoke particles mainly come from volatilization and condensation of oil and fat and decomposition and cracking of oil and fat food materials in the cooking process, and are generally called oil smoke; the gas substance mainly refers to Volatile Organic Compounds (VOCs) capable of promoting atmospheric OH free radicals and O3And the generation of secondary organic aerosols, leading to photochemical smog contamination events. Meanwhile, the concentration of the VOCs at the catering oil fume discharge port can reach 2-9 times of the background value of the environment, so that the concentration of the VOCs in the catering oil fume is high, the surrounding environment is seriously affected, the VOCs discharged by the catering oil fume is also the main reason for generating strong pungent smell by the oil fume, and direct interference is generated on the life of surrounding residents.
Along with the increasing pollution of catering oil fume, the contribution of catering sources to the urban atmospheric environment VOCs is also concerned, and researches show that the emission amount of the catering oil fume VOCs in hong Kong accounts for 1.07 percent of the total emission amount of the artificial VOCs; meanwhile, in the research of the discharge list of the man-made VOCs in the metropolis, the result shows that the contribution rate of the catering source is 0.94%; it is also pointed out that the contribution rate of the catering sources to the artificial VOCs in Jiangsu province is 3.19%. In the Chongqing area, Sichuan cuisine is taken as a main material, the cooking mode is mainly explosion and frying, the oil smoke amount generated by cooking is obviously higher than that in other places in China, and data shows that the contribution of life source pollution sources to PM2.5 from 2012 to 2017 is increased from 10% to 14%, and the contribution of catering oil smoke in Chongqing main urban areas to PM2.5 from 2017 accounts for 5.5%. With the continuous deep promotion of blue sky action in our city, the original industrial pollution emission source is transferred to suburbs or upgraded and modified, the tail gas pollution of motor vehicles is gradually reduced after the implementation of electric vehicles and rail transit, and the catering oil smoke emission pollution is gradually highlighted. The data show that the catering oil fume pollution gradually becomes the main environmental protection problem and the focus of complaints in urban areas in recent years. In environment-friendly complaints reported and accepted by the environmental protection bureau 12369 in Chongqing city from 2012 to 2017, the proportion of the catering oil smoke complaints to the total complaint amount of the air pollution is increased from 20% to 27.4%, wherein the proportion of the catering oil smoke problem complaints to the total complaint amount in Yuzhong district mainly based on the third industry is increased from 12.3% to 21.1% in 2013 to 2015, so that a series of policies or requirements are adopted in countries and even Chongqing to prevent and control the oil smoke pollution problems in the catering industry.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides an environmental protection abnormal data fine screening method based on cluster analysis.
In order to achieve the above purpose, the invention provides an environmental protection abnormal data fine screening method based on cluster analysis, which comprises the following steps:
s1, acquiring data to be screened, and forming a database to be screened by the acquired data to be screened;
s2, screening quantitative data from the database to be screened, and forming the screened quantitative data into a quantitative database;
s3, screening abnormal data from the quantitative database, and removing the screened abnormal data from the quantitative database.
In a preferred embodiment of the present invention, in step S1, the data to be screened includes attributes of each restaurant unit, where the attributes of each restaurant unit include registered capital, units of registered capital are ten thousand yuan, operating area, units of operating area are square meters, annual revenue amount, fixed number of stoves, number of mobile stoves, average business duration, and average business duration, the average number of dining room people per day, average number of dining room people per person, average consumption of dining room people per person, types of restaurants, main processing manners, types of fuels, numbers of lampblack purification devices, types of lampblack purification devices, months, monthly water, monthly fuel, monthly oil, or any combination thereof.
In a preferred embodiment of the present invention, in step S2, the quantitative data includes an operation area, the unit of the operation area is square meter, the average business hours, and the unit of the average business hours is, the average number of people having dinner in the dining room per day is one of people, water in the month, fuel in the month, and oil in the month, or any combination thereof.
In a preferred embodiment of the present invention, in step S3, the method for fine screening abnormal data from the quantitative database comprises:
s31, analyzing the data files in the quantitative database;
s32, calculating the Euclidean distance between each point and all other points, and calculating the k-distance value of each point;
s33, sorting the k-distance sets of all the points in an ascending order, and outputting the sorted k-distance values;
s34, displaying the k-distance change trend of all the points by using a scatter diagram in Excel;
s35, determining the value of the radius Eps according to the scatter diagram;
s36, calculating all core points according to the quantity MinPts of the given minimum points and the value of the radius Eps, and establishing the mapping between the core points and the points with the distance to the core points smaller than the radius Eps;
s37, calculating connectable core points according to the obtained core point set and the value of the radius Eps to obtain noise points; the noise point is abnormal data; noise spots were excluded from the quantification database.
In conclusion, by adopting the technical scheme, the data collected in the early stage can be finely screened before the tobacco tar data is processed, abnormal data points in the obtained data are eliminated, and the interference caused by the abnormal data is avoided.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of abnormal data points far from the center of a cluster according to the present invention.
Figure 2 is a schematic representation of a cluster of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The invention provides an environmental protection abnormal data fine screening method based on cluster analysis, which comprises the following steps:
s1, acquiring data to be screened, and forming a database to be screened by the acquired data to be screened; in this embodiment, the data to be screened includes attributes of each restaurant unit, the attributes of each restaurant unit include registered funds, units of the registered funds are ten thousand yuan, the operating area is square meters, units of the operating area are yearly tax amount, units of yearly tax amount are ten thousand yuan, fixed kitchen range number, mobile kitchen range number, average business duration, and units of the average business duration are time, average number of dining room people who eat in each daily dining room is person, average number of dining room people who eat in each dining room is consumed, average unit of dining room people who consume is yuan, restaurant type, main processing mode, fuel type, number of lampblack purification equipment, lampblack purification equipment type, month water, month fuel, month oil or any combination of month oil.
S2, screening quantitative data from the database to be screened, and forming the screened quantitative data into a quantitative database; in this embodiment, the quantitative data includes the operation area, the unit of the operation area is square meter, the average business duration, and the unit of the average business duration is time, the average number of dining room people per day is person, the month water, the month fuel, and the month oil, or any combination thereof.
S3, screening abnormal data from the quantitative database, and removing the screened abnormal data from the quantitative database. In this embodiment, the method for finely screening abnormal data from the quantitative database comprises:
s31, analyzing the data files in the quantitative database;
s32, calculating the Euclidean distance between each point and all other points, and calculating the k-distance value of each point;
s33, sorting the k-distance sets of all the points in an ascending order, and outputting the sorted k-distance values;
s34, displaying the k-distance change trend of all the points by using a scatter diagram in Excel;
s35, determining the value of the radius Eps according to the scatter diagram;
s36, calculating all core points according to the quantity MinPts of the given minimum points and the value of the radius Eps, and establishing the mapping between the core points and the points with the distance to the core points smaller than the radius Eps;
s37, calculating connectable core points according to the obtained core point set and the value of the radius Eps to obtain noise points; the noise point is abnormal data; noise spots were excluded from the quantification database. Fig. 1 is a two-dimensional diagram of abnormal data points far from the center of a cluster.
Step S38 is also included to bring together each set of core points that can be connected, and points that are less than the radius Eps from the core point, to form a cluster, as shown in fig. 2, if the area with radius d for a different data point contains at least a certain number of other data points above a certain density (four in fig. 2), they will be referred to as the center point, point a in fig. 2. For points adjacent to some of the center points, i.e., point B, C in FIG. 2, they are referred to as reachable points. In addition, if there are data sets that are not adjacent to any center point, they are called noise points, i.e., are outlier data points (N points in fig. 2).
In a preferred embodiment of the present invention, further comprising S4, S5 and S6; s4, dividing the data in the quantitative database into M training sets and N testing sets; m, N is a positive integer greater than or equal to 1, and the data in the M training sets are loaded into the original learning model in sequence for training to obtain a target learning model; in this embodiment, the M training sets are the 1 st training set, the 2 nd training set, the 3 rd training set, … …, and the M training set, respectively, and the N test sets are the 1 st test set, the 2 nd test set, the 3 rd test set, … …, and the N test set, ({ a1,A2,A3,…,AM}∪{B1,B2,B3,…,BN})=C,AiRepresents the ith training set, wherein i is a positive integer less than or equal to M, BjAnd representing a jth test set, wherein j is a positive integer less than or equal to N, and C represents a processing database.
And S5, inputting the collected data into the target learning model to obtain the predicted oil smoke value.
And S6, if the predicted oil smoke value obtained in the step S5 is larger than or equal to the preset oil smoke threshold value, the cloud server sends an alarm area to the intelligent terminal.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (4)

1. An environmental protection abnormal data fine screening method based on cluster analysis is characterized by comprising the following steps:
s1, acquiring data to be screened, and forming a database to be screened by the acquired data to be screened;
s2, screening quantitative data from the database to be screened, and forming the screened quantitative data into a quantitative database;
s3, screening abnormal data from the quantitative database, and removing the screened abnormal data from the quantitative database.
2. The environmental protection abnormal data fine screening method based on cluster analysis according to claim 1, in step S1, the data to be filtered includes attributes of each restaurant unit, the attributes of each restaurant unit include registered capital, units of registered capital are ten thousand yuan, operating area, units of operating area are square meters, annual tax amount, fixed number of stoves, mobile number of stoves, average business duration, and average business duration, the average number of dining room people per day, the average number of dining room people per day is one, the average number of dining room people per day is consumed by all the dining room people, the average number of dining room people is one or any combination of the number of dining room people, the number of catering types, the main processing mode, the fuel type, the number of oil fume purification equipment, the type of oil fume purification equipment, the month water, the month fuel and the month oil.
3. The environmental anomaly data fine screening method based on cluster analysis according to claim 1, wherein in step S2, the quantitative data includes the unit of the operation area, the unit of the operation area is square meter, the average operation time, the unit of the average operation time is, the average number of dining room dining people each day, the unit of the average number of dining room dining people each day is human, the month water, the month fuel, the month oil or any combination thereof.
4. The environmental protection abnormal data fine screening method based on cluster analysis according to claim 1, wherein in step S3, the method for fine screening abnormal data from the quantitative database is:
s31, analyzing the data files in the quantitative database;
s32, calculating the Euclidean distance between each point and all other points, and calculating the k-distance value of each point;
s33, sorting the k-distance sets of all the points in an ascending order, and outputting the sorted k-distance values;
s34, displaying the k-distance change trend of all the points by using a scatter diagram in Excel;
s35, determining the value of the radius Eps according to the scatter diagram;
s36, calculating all core points according to the quantity MinPts of the given minimum points and the value of the radius Eps, and establishing the mapping between the core points and the points with the distance to the core points smaller than the radius Eps;
s37, calculating connectable core points according to the obtained core point set and the value of the radius Eps to obtain noise points; the noise point is abnormal data; noise spots were excluded from the quantification database.
CN202010199335.6A 2020-03-20 2020-03-20 Environmental protection abnormal data fine screening method based on cluster analysis Pending CN111427877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199335.6A CN111427877A (en) 2020-03-20 2020-03-20 Environmental protection abnormal data fine screening method based on cluster analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199335.6A CN111427877A (en) 2020-03-20 2020-03-20 Environmental protection abnormal data fine screening method based on cluster analysis

Publications (1)

Publication Number Publication Date
CN111427877A true CN111427877A (en) 2020-07-17

Family

ID=71548204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199335.6A Pending CN111427877A (en) 2020-03-20 2020-03-20 Environmental protection abnormal data fine screening method based on cluster analysis

Country Status (1)

Country Link
CN (1) CN111427877A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106912015A (en) * 2017-01-10 2017-06-30 上海云砥信息科技有限公司 A kind of personnel's Trip chain recognition methods based on mobile network data
US20180060694A1 (en) * 2015-09-14 2018-03-01 International Business Machines Corporation System, method, and recording medium for efficient cohesive subgraph identifiation in entity collections for inlier and outlier detection
CN110533314A (en) * 2019-08-23 2019-12-03 西安交通大学 A kind of wind power plant exception unit recognition methods based on probability density distribution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060694A1 (en) * 2015-09-14 2018-03-01 International Business Machines Corporation System, method, and recording medium for efficient cohesive subgraph identifiation in entity collections for inlier and outlier detection
CN106912015A (en) * 2017-01-10 2017-06-30 上海云砥信息科技有限公司 A kind of personnel's Trip chain recognition methods based on mobile network data
CN110533314A (en) * 2019-08-23 2019-12-03 西安交通大学 A kind of wind power plant exception unit recognition methods based on probability density distribution

Similar Documents

Publication Publication Date Title
Yan et al. The heterogeneous effects of socioeconomic determinants on PM2. 5 concentrations using a two-step panel quantile regression
Burns et al. Interventions to reduce ambient particulate matter air pollution and their effect on health
Liao et al. Air pollution, health care use and medical costs: Evidence from China
Maddison Environmental Kuznets curves: A spatial econometric approach
Wallace Indoor sources of ultrafine and accumulation mode particles: size distributions, size-resolved concentrations, and source strengths
Sze-To et al. Exposure and cancer risk toward cooking-generated ultrafine and coarse particles in Hong Kong homes
Feng et al. Dietary risk factors for nasopharyngeal carcinoma in Maghrebian countries
Hu et al. Indoor sources strongly contribute to exposure of Chinese urban residents to PM2. 5 and NO2
Hou et al. Environmental exposure to polycyclic aromatic hydrocarbons, kitchen ventilation, fractional exhaled nitric oxide, and risk of diabetes among Chinese females
Kumar et al. In-kitchen aerosol exposure in twelve cities across the globe
Lee et al. Emission rate of particulate matter and its removal efficiency by precipitators in under-fired charbroiling restaurants
Zhu et al. A review on reducing indoor particulate matter concentrations from personal‐level air filtration intervention under real‐world exposure situations
Lim et al. Comparing human exposure to fine particulate matter in low and high-income countries: A systematic review of studies measuring personal PM2. 5 exposure
Hunter et al. Environmental hazards, migration, and race
CN111427877A (en) Environmental protection abnormal data fine screening method based on cluster analysis
Li et al. Impact of COVID-19 containment and closure policies on tropospheric nitrogen dioxide: A global perspective
Yuan et al. Impact of commercial cooking on urban PM2. 5 and O3 with online data-assisted emission inventory
CN111428135A (en) Environmental protection abnormal data rough screening method based on Gaussian model
Fameli et al. Inventory of Commercial Cooking Activities and Emissions in a Typical Urban Area in Greece
Keles et al. Impact of air pollution on prevalence of rhinitis in Istanbul
CN111340310A (en) Catering oil smoke prediction method based on big data model
Li et al. High-resolution emission inventory of full-volatility organic compounds from cooking in China during 2015–2021
Morantes et al. Harm from Residential Indoor Air Contaminants
Huang et al. High-resolution emission inventory of full-volatility organic compounds from cooking in China during 2015–2021
Yuchi Modelling Fine Particulate Matter Concentrations inside the Homes of Pregnant Women in Ulaanbaatar, Mongolia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200717