CN110245880A - A kind of pollution sources on-line monitoring data cheating recognition methods - Google Patents
A kind of pollution sources on-line monitoring data cheating recognition methods Download PDFInfo
- Publication number
- CN110245880A CN110245880A CN201910591968.9A CN201910591968A CN110245880A CN 110245880 A CN110245880 A CN 110245880A CN 201910591968 A CN201910591968 A CN 201910591968A CN 110245880 A CN110245880 A CN 110245880A
- Authority
- CN
- China
- Prior art keywords
- data
- enterprise
- exceeded
- monitoring
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 148
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000004458 analytical method Methods 0.000 claims abstract description 82
- 238000012216 screening Methods 0.000 claims abstract description 49
- 230000005764 inhibitory process Effects 0.000 claims abstract description 34
- 238000010801 machine learning Methods 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 238000012795 verification Methods 0.000 claims abstract description 7
- 239000000523 sample Substances 0.000 claims description 73
- 230000002159 abnormal effect Effects 0.000 claims description 66
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 54
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 claims description 27
- 229910052757 nitrogen Inorganic materials 0.000 claims description 27
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 24
- 229910052799 carbon Inorganic materials 0.000 claims description 24
- 238000001514 detection method Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000005311 autocorrelation function Methods 0.000 claims description 12
- 238000003908 quality control method Methods 0.000 claims description 12
- 238000010790 dilution Methods 0.000 claims description 10
- 239000012895 dilution Substances 0.000 claims description 10
- 239000002351 wastewater Substances 0.000 claims description 10
- 239000010865 sewage Substances 0.000 claims description 9
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 claims description 7
- 239000003344 environmental pollutant Substances 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000002955 isolation Methods 0.000 claims description 6
- 231100000719 pollutant Toxicity 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000012731 temporal analysis Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000000700 time series analysis Methods 0.000 claims description 6
- 238000004065 wastewater treatment Methods 0.000 claims description 6
- 241001123248 Arma Species 0.000 claims description 3
- 241001269238 Data Species 0.000 claims description 3
- 230000001594 aberrant effect Effects 0.000 claims description 3
- 238000003556 assay Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 238000007599 discharging Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims description 3
- 239000013062 quality control Sample Substances 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 230000035945 sensitivity Effects 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 239000002699 waste material Substances 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 description 5
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 4
- 239000001301 oxygen Substances 0.000 description 4
- 229910052760 oxygen Inorganic materials 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 239000007789 gas Substances 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07C—TIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
- G07C9/00—Individual registration on entry or exit
- G07C9/00174—Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys
- G07C9/00563—Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys using personal physical data of the operator, e.g. finger prints, retinal images, voicepatterns
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of on-line monitoring data cheating recognition methods of pollution sources, and this method includes data prediction, unalterable rules screening, video gate inhibition, on-site examination and the rule optimization based on machine learning.Wherein, unalterable rules screening, including enterprise's cheating regular screening, enterprise's instrument failure screening and O&M unit exception screening.Video gate inhibition is a kind of tool whether lookup enterprise practises fraud, and video and gate inhibition's alarm can show in system.On-site examination is the field verification to unalterable rules screening results data and video gate inhibition, it can be concluded that enterprise whether practise fraud, whether instrument failure, O&M unit whether the results such as play tricks by O&M record, machine learning is the feedback optimized rule based on on-site examination, so that fixed screening results confidence level is higher.A kind of pollution sources on-line monitoring data cheating recognition methods proposed by the present invention, is able to solve the problems such as effluent exhaust gas is stolen by enterprise, on-line monitoring O&M is lack of standardization, while can assist the Analysis of Policy Making of user.
Description
Technical field
The present invention relates to anti-cheating field more particularly to a kind of pollution sources on-line monitoring data cheating identification side is monitored online
Method.
Background technique
Environmental quality is the focus of public attention, how preferably to become correlation using available data management pollution sources enterprise
The problem of mechanism.The status of cheating anti-for pollution sources can mainly be attributed to three aspects: video monitoring detection process, and work
Personnel are by observation data, for example detected value is too large or too small etc. is judged.Currently, cheating data can only be by manually examining
Core, experience audit.Even, more situations is to have received the complaint of the common people, and government department superintends and checks according to process, effect
It is little.For mass data, cost of labor is very high, and daily each pollution sources enterprise can generate up to a hundred monitoring data, then people
Work review efficiency is low.It is monitored in real time using machinery remote, and cannot be guaranteed the reliability of video monitoring, then, for this
, often there is certain regularity in the "abnormal" data of manual intervention, find out the rule between "abnormal" data, provide inspection
The corresponding decision references of organ, can greatly improve the supervision to illegal enterprise, the strong protection to environment.
Summary of the invention
Present invention aims in view of the deficiencies of the prior art, propose a kind of pollution sources on-line monitoring data cheating identification side
Method is able to solve the problems such as effluent exhaust gas is stolen by enterprise, on-line monitoring O&M is lack of standardization, while can assist the decision point of user
Analysis.
The purpose of the present invention is achieved through the following technical solutions: a kind of pollution sources on-line monitoring data cheating identification
Method, this method comprises:
1) data prediction: on-line monitoring data are pre-processed, select hourly value data as unalterable rules screening
Basic data, and invalid data is handled, if the data that processing rule is the detection of certain time period flow instrument are zero, when rejecting this
Between all monitoring instruments of section monitoring index data;The percentage that the monitoring index data of monitoring instrument are zero is judged, if being less than
Threshold value is then carried out the filling of data using interpolation method, otherwise rejects this monitoring index data;
2) unalterable rules screening: abnormal including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit
Rule base;
Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilution,
Discharge amount analysis of the same trade, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;
Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark to be out of order and instrument and have
The enterprise of failure instrument, including zero and constant;
The O&M unit exception rules library marks out O&M list for judging whether O&M unit Quality Control record plays tricks
The abnormal enterprise in position;
The result of unalterable rules screening is shown with visual means;
3) it video gate inhibition: is recorded including video monitoring and gate inhibition, video monitoring includes pollutant discharge of enterprise mouth video monitoring and station
Room video monitoring, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition
There are two types of application forms, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of work
Disadvantage, instrument failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if
It monitors that sewage draining exit water turbidity, sewage draining exit someone are close or station has the case where unauthorized person is swarmed into, exports warning information;
4) on-site examination: on-site examination personnel believe according to unalterable rules screening results and video access information in conjunction with enterprise
Breath carries out on-site examination, and company information includes enterprise's on-line monitoring data, O&M record etc., and it is online that on-site examination can generate enterprise
Monitor whether that there are data to practise fraud, whether instrument whether fake three kinds of result datas by failure, O&M unit Quality Control record, output knot
Fruit simultaneously is used to correct machine learning relevant parameter as mark information, optimizes unalterable rules screening method, to obtain more
High accuracy;
5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with machine learning
Mode unalterable rules screening is continued to optimize, form confidence level higher screening rule, the mode of the machine learning is non-
Supervised learning and semi-supervised learning combine or time series analysis TSA, sets according to the actual situation during machine learning
Surely it is suitble to the threshold value of specific requirements.
Further, the processing rule of the invalid data is specifically, if the monitoring index data only lower than 10% are
Zero, then the filling of data is carried out using interpolation method;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
Further, in enterprise's cheating rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules are not
Do monitoring index data whether the judgement for being zero, the data of carry out flow instrument detection whether the judgement for being zero.
Further, the dilution in enterprise cheating rule base specifically:
1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;For these hands of practising fraud
Section, carries out on-line monitoring Data Synchronization Analysis to the enterprise for having two and the above monitoring index, supervises if there is two or more
It surveys the case where index increases or reduces in proportion and is then labeled as cheating suspicion enterprise;Specifically includes the following steps:
(1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index to supervise online
Control data do the regular screening of dilution.
(2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively when N
An, Bn, Cn;
Time | a | b | c |
When N | An | Bn | Cn |
When N+1 | An+1 | Bn+1 | Cn+1 |
If meeting following relationship simultaneously:An、Bn、CnOne of them is more than row
Put the 70% of standard.
It (3) is cheating suspicion enterprise by result queue.
2) the discharge amount analysis of the same trade specifically:
Industry that blowdown enterprise is related to, discharge standard limitation, enterprise production and waste water treatment process these because
Element is used as screening conditions, and same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is returned
Class, this kind of enterprise are considered similar enterprise.It is looked for according to the wastewater discharge of similar enterprise with the concentration analogy analysis of monitoring item
It is abnormal in the enterprise of average level out, it is cheating suspicion enterprise by the business sign.
3) the monitoring index Cooperative Analysis includes: that the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and chemistry need
Oxygen amount and total organic carbon Cooperative Analysis.
(a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data
The case where, ammonia nitrogen concentration is further analyzed greater than the case where total nitrogen concentration.
(1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.
(2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)
Time | a | b |
When N | A1 | B1 |
Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.
It (3) is cheating suspicion enterprise by result queue.
(b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.
Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, the wherein number of P and Q
According to can obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.
So can be released with relational expression another according to any one numerical value in COD or total organic carbon
A numerical value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.Chemical need are calculated according to x1
The calculated value y2 of oxygen amount, provides according to standard, and the error of the assay value both in enterprise's waste discharge is to allow within 10%
, i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.
4) exceeded bust and it is interrupted exceeded bust, the exceeded bust includes exceeded analysis and neighbouring exceeded analysis, exceeded
Analysis or neighbouring exceeded analysis refer to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the prison
It surveys index value to decline suddenly, then this index is further analyzed;Specifically includes the following steps:
Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn;
The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOrIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, An's
Numerical value is more than exceeded line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Also same
Sample belongs to abnormal data.If the case where meeting the first or second, it is labeled as exceeded bust.
Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N of exceeded analysis
+ 7 data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.
Result above is labeled as cheating suspicion enterprise.
5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:
A) exceeded bust+monitoring index Cooperative Analysis
It for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD
Correlativity is not met with total organic carbon TOC, two kinds of cheating data characteristicses are met simultaneously for this data, cheating suspicion increases
Add, is labeled as cheating suspicion enterprise.
B) exceeded bust+constant
For finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise
Constant feature in instrument failure rule base, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, labeled as cheating
Suspicion enterprise.
6) spatial pattern and process is a kind of method for analyzing exceptional value, is sentenced if one group of data deviation average is far
It is set to dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.
Rule: by data according to arranging from big to small, then be likely to occur exceptional value data frequently appear in maximum value or
In the data of minimum value.
(i) for the data (each Monitoring factors are a data) containing n hourly value, it is every to calculate the data
The statistic G, the statistic G of i-th of hourly value of a hourly valueiIt may be expressed as:
Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;
(ii) Grubbs coefficient is searched
Corresponding critical quantity in Grubbs coefficient table is searched according to statistic G;
(iii) exceptional value is found out
Work as xiMaximum value or the corresponding statistic G of minimum value be greater than critical quantity when, then it is assumed that corresponding maximum
Value or minimum value are doubtful exceptional values;
It (iv) is cheating suspicion enterprise by result queue.
Further, the zero of enterprise's instrument failure rule base and constant are specially;
A) zero
On-line monitoring monitoring index is all zero within flow number continuous 24 hours, is disliked labeled as instrument failure
It doubts.
B) constant
On-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, dislikes labeled as instrument failure
It doubts.
Further, the O&M unit exception rules refer to that the Quality Control record that O&M unit retains adopts instrument upload with number
Data are inconsistent, are further analyzed in this case according to following rule;
Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met |
M-N |/N >=30% is played tricks labeled as O&M unit Quality Control record.
Further, by access video gate inhibition, be labeled as cheating suspicion enterprise on-line monitoring data binding analysis,
Video gate inhibition can find out the video data in corresponding time range according to abnormal data time point, analyze its cheating.
Further, the on-site examination is for being marked by unalterable rules screening and video gate inhibition's binding analysis
Cheating suspicion enterprise, related personnel goes enterprise's field verification situation, and obtains evidence;It is found out based on unalterable rules screening
Abnormal data, by on-site verification instrument whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation
Determine whether the enterprise practises fraud.
Further, the data cheating identification based on machine learning that the unsupervised learning and semi-supervised learning combine
Mode refers in the initial state, without mark information in the case where, can only go to distinguish using non-supervisory clustering method separate
Overall point can make full use of these reliable test values to obtain more after acquisition fraction reliable artificial detection result
It is good as a result, using clustering method and ADOA (Anomaly Detection with partial Observed
Anomalies the method) combined, specific steps include:
(1) unmarked initial stage, using the method for non-supervisory cluster, this method is chosen as the calculation of the k-mean based on distance
Method or DBSCAN based on density.
(2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA be have it is a large amount of unmarked
Sample, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types
's;ADOA algorithm is divided into two stages:
Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation
Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score)
Sample.Wherein:
(a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample,
In isolated forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability that sample x is abnormal point
Size (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample arm path lengths
The mean value of degree.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n)
=2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated
Score IS (x) can be indicated are as follows:
IS (x) illustrates that this sample more may be exceptional sample closer to 1;
(b) similar score:
It obviously more may be potential closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample)
Abnormal point, so that similar score SS (x) can indicate are as follows:
Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;
(c) total score: in order to filter out potential abnormal point and believable normal point need to consider simultaneously isolated score with
Similar score, integrating total score TS (x) can indicate are as follows:
TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]
As TS (x) >=α, it can determine that the sample is potential abnormal point;
As TS (x)≤β, it can determine that the sample is credible normal point.
Threshold alpha herein, β can sensitivity according to actual needs set.
Stage two:
Corresponding weight, particularly, the weight quilt of the abnormal marking sample manually obtained are set to each sample first
Be set as 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:
And for credible normal point, the lower weight of TS (x) should be bigger:
Problem is become into (k+1)-classification problem, minimizing optimization object is:
Wherein wiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is canonical
Term coefficient.This problem can be solved with more classification SVM.
Further, it is certain to refer to that each monitoring data exist in timing for the time series analysis TSA analysis method
Periodicity, can use the rule of development of previous data, carry out prediction successive time point every monitoring data, therefore, it is determined that
It whether is abnormal data, specific steps include:
Using difference ARMA model (ARIMA), with AIC (Akaike Information Criterion)
Best illustration data are found as evaluation criterion and include the least model of free parameter (being determined by p, d, q parameter).
The model are as follows:
At=φ1At-1+φ2At-2+…+φpAt-p+δ+ut+θ1ut-1+θ2ut-2+…+θqut-q
Wherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is
Error coefficient, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate prediction error
Lag number;
Stage one:
According to data, calculate ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern, according to ACF and
PACF figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable
The sequence of non-stationary is obtained the sequence of stationarity by difference as needed by sequence.
Stage two:
Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:
P: the lag number (lags) of the time series data used in prediction model itself, also known as autoregression item;
D: the difference number for needing to carry out, also known as Difference Terms;
Q: the lag number (lags) of error, also known as rolling average item are predicted;
Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。
Stage three:
The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is monitored by calculating
It is worth the Euclidean distance of predicted value and is compared with the threshold value manually set therefore, it is determined that whether being abnormal point.
Beneficial effects of the present invention: business is monitored online in further improvement and optimization, effectively supervises the discharge of enterprise wastewater exhaust gas,
On existing on-line monitoring detection big data fundamental analysis, data prison is carried out to on-line monitoring detection line and detection device
It surveys, analyze and handles, realize and the applications such as anti-cheating early warning, decision assistant analysis are carried out to the on-line monitoring data of environmental protection information,
Strong monitoring of the environmental protection administration to on-line monitoring is greatly improved, to realize wisdom environmental protection.The actual conditions of combining environmental monitoring,
It, as cold start-up, is used after with certain abnormal marking information semi-supervised using first with unsupervised clustering algorithm
Study adjustment model accuracy, to more accurately find the abnormal data in monitoring data, the present invention is calculated using ARIMA
Method, the periodicity of mining data, so that the case where system can find artificial manufaturing data to a certain extent.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The specific embodiment of the invention is described in further detail below in conjunction with attached drawing.
The recognition methods as shown in Figure 1, a kind of pollution sources on-line monitoring data provided by the invention are practised fraud, this method comprises:
1) data prediction: on-line monitoring data are pre-processed, select hourly value data as unalterable rules screening
Basic data, and invalid data is handled, if it is zero that processing rule, which is the data that certain time period flow instrument detects, rejecting should
Period all monitoring index data;Judge the percentage that monitoring index data are zero, if being less than threshold value, utilizes interpolation method
(such as Newton interpolating method) carries out the filling of data, otherwise rejects this monitoring index data;
2) unalterable rules screening: abnormal including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit
Rule base;
Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilution,
Discharge amount analysis of the same trade, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;
Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark to be out of order and instrument and have
The enterprise of failure instrument, including zero and constant;
The O&M unit exception rules library marks out O&M list for judging whether O&M unit Quality Control record plays tricks
The abnormal enterprise in position;
The result of unalterable rules screening is shown with visual means;
3) it video gate inhibition: is recorded including video monitoring and gate inhibition, video monitoring includes pollutant discharge of enterprise mouth video monitoring and station
Room video monitoring, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition
There are two types of application forms, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of work
Disadvantage, instrument failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if
It monitors that sewage draining exit water turbidity, sewage draining exit someone are close or station has the case where unauthorized person is swarmed into, exports warning information;
4) on-site examination: on-site examination personnel believe according to unalterable rules screening results and video access information in conjunction with enterprise
Breath carries out on-site examination, and company information includes enterprise's on-line monitoring data, O&M record etc., and it is online that on-site examination can generate enterprise
Monitor whether that there are data to practise fraud, whether instrument whether fake three kinds of result datas by failure, O&M unit Quality Control record, output knot
Fruit simultaneously is used to correct machine learning relevant parameter as mark information, optimizes unalterable rules screening method, to obtain more
High accuracy;
5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with machine learning
Mode unalterable rules screening is continued to optimize, form confidence level higher screening rule, the mode of the machine learning is non-
Supervised learning and semi-supervised learning combine or time series analysis TSA, sets according to the actual situation during machine learning
Surely it is suitble to the threshold value of specific requirements.
Further, the processing rule of the invalid data is specifically, if the monitoring index data only lower than 10% are
Zero, then the filling of data is carried out using interpolation method;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
Further, in enterprise's cheating rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules are not
Do monitoring index data whether the judgement for being zero, the data of carry out flow instrument detection whether the judgement for being zero.
Further, the dilution in enterprise cheating rule base specifically:
1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;For these hands of practising fraud
Section, carries out on-line monitoring Data Synchronization Analysis to the enterprise for having two and the above monitoring index, supervises if there is two or more
It surveys the case where index increases or reduces in proportion and is then labeled as cheating suspicion enterprise;Specifically includes the following steps:
(1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index to supervise online
Control data do the regular screening of dilution.
(2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively when N
An, Bn, Cn;
Time | a | b | c |
When N | An | Bn | Cn |
When N+1 | An+1 | Bn+1 | Cn+1 |
If meeting following relationship simultaneously:An、Bn、CnOne of them is more than row
Put the 70% of standard.
It (3) is cheating suspicion enterprise by result queue.
2) the discharge amount analysis of the same trade specifically:
Industry that blowdown enterprise is related to, discharge standard limitation, enterprise production and waste water treatment process these because
Element is used as screening conditions, and same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is returned
Class, this kind of enterprise are considered similar enterprise.It is looked for according to the wastewater discharge of similar enterprise with the concentration analogy analysis of monitoring item
It is abnormal in the enterprise of average level out, it is cheating suspicion enterprise by the business sign.
3) the monitoring index Cooperative Analysis includes: that the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and chemistry need
Oxygen amount and total organic carbon Cooperative Analysis.
(a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data
The case where, ammonia nitrogen concentration is further analyzed greater than the case where total nitrogen concentration.
(1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.
(2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)
Time | a | b |
When N | A1 | B1 |
Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.
It (3) is cheating suspicion enterprise by result queue.
(b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.
Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, the wherein number of P and Q
According to can obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.
So can be released with relational expression another according to any one numerical value in COD or total organic carbon
A numerical value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.Chemical need are calculated according to x1
The calculated value y2 of oxygen amount, provides according to standard, and the error of the assay value both in enterprise's waste discharge is to allow within 10%
, i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.
4) exceeded bust and it is interrupted exceeded bust, the exceeded bust includes exceeded analysis and neighbouring exceeded analysis, exceeded
Analysis or neighbouring exceeded analysis refer to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the prison
It surveys index value to decline suddenly, then this index is further analyzed;Specifically includes the following steps:
Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn;
Time | a |
When N | An |
When N+1 | An+1 |
When N+2 | An+2 |
When N+3 | An+3 |
The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOrIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, An's
Numerical value is more than exceeded line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Also same
Sample belongs to abnormal data.If the case where meeting the first or second, it is labeled as exceeded bust.
Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N of exceeded analysis
+ 7 data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.
Result above is labeled as cheating suspicion enterprise.
5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:
A) exceeded bust+monitoring index Cooperative Analysis
It for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD
Correlativity is not met with total organic carbon TOC, two kinds of cheating data characteristicses are met simultaneously for this data, cheating suspicion increases
Add, is labeled as cheating suspicion enterprise.
B) exceeded bust+constant
For finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise
Constant feature in instrument failure rule base, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, labeled as cheating
Suspicion enterprise.
6) spatial pattern and process is a kind of method for analyzing exceptional value, is sentenced if one group of data deviation average is far
It is set to dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.
Rule: by data according to arranging from big to small, then be likely to occur exceptional value data frequently appear in maximum value or
In the data of minimum value.
(i) for the data (each Monitoring factors are a data) containing n hourly value, it is every to calculate the data
The statistic G, the statistic G of i-th of hourly value of a hourly valueiIt may be expressed as:
Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;
(ii) Grubbs coefficient is searched
Corresponding critical quantity in Grubbs coefficient table is searched according to statistic G;
(iii) exceptional value is found out
Work as xiMaximum value or the corresponding statistic G of minimum value be greater than critical quantity when, then it is assumed that corresponding maximum
Value or minimum value are doubtful exceptional values;
It (iv) is cheating suspicion enterprise by result queue.
Further, the zero of enterprise's instrument failure rule base and constant are specially;
A) zero
On-line monitoring monitoring index is all zero within flow number continuous 24 hours, is disliked labeled as instrument failure
It doubts.
B) constant
On-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, dislikes labeled as instrument failure
It doubts.
Further, the O&M unit exception rules refer to that the Quality Control record that O&M unit retains adopts instrument upload with number
Data are inconsistent, are further analyzed in this case according to following rule;
Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met |
M-N |/N >=30% is played tricks labeled as O&M unit Quality Control record.
Further, by access video gate inhibition, be labeled as cheating suspicion enterprise on-line monitoring data binding analysis,
Video gate inhibition can find out the video data in corresponding time range according to abnormal data time point, analyze its cheating.
Further, the on-site examination is for being marked by unalterable rules screening and video gate inhibition's binding analysis
Cheating suspicion enterprise, related personnel goes enterprise's field verification situation, and obtains evidence;It is found out based on unalterable rules screening
Abnormal data, by on-site verification instrument whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation
Determine whether the enterprise practises fraud.
Further, the data cheating identification based on machine learning that the unsupervised learning and semi-supervised learning combine
Mode refers in the initial state, without mark information in the case where, can only go to distinguish using non-supervisory clustering method separate
Overall point can make full use of these reliable test values to obtain more after acquisition fraction reliable artificial detection result
It is good as a result, using clustering method and ADOA (Anomaly Detection with partial Observed
Anomalies the method) combined, specific steps include:
(1) unmarked initial stage, using the method for non-supervisory cluster, this method is chosen as the calculation of the k-mean based on distance
Method or DBSCAN based on density.
(2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA be have it is a large amount of unmarked
Sample, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types
's;ADOA algorithm is divided into two stages:
Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation
Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score)
Sample.Wherein:
(a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample,
In isolated forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability that sample x is abnormal point
Size (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample arm path lengths
The mean value of degree.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n)
=2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated
Score IS (x) can be indicated are as follows:
IS (x) illustrates that this sample more may be exceptional sample closer to 1;
(b) similar score:
It obviously more may be potential closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample)
Abnormal point, so that similar score SS (x) can indicate are as follows:
Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;
(c) total score: in order to filter out potential abnormal point and believable normal point need to consider simultaneously isolated score with
Similar score, integrating total score TS (x) can indicate are as follows:
TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]
As TS (x) >=α, it can determine that the sample is potential abnormal point;
As TS (x)≤β, it can determine that the sample is credible normal point.
Threshold alpha herein, β can sensitivity according to actual needs set.
Stage two:
Corresponding weight, particularly, the weight quilt of the abnormal marking sample manually obtained are set to each sample first
Be set as 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:
And for credible normal point, the lower weight of TS (x) should be bigger:
Problem is become into (k+1)-classification problem, minimizing optimization object is:
Wherein wiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is canonical
Term coefficient.This problem can be solved with more classification SVM.
Further, it is certain to refer to that each monitoring data exist in timing for the time series analysis TSA analysis method
Periodicity, can use the rule of development of previous data, carry out prediction successive time point every monitoring data, therefore, it is determined that
It whether is abnormal data, specific steps include:
Using difference ARMA model (ARIMA), with AIC (Akaike Information Criterion)
Best illustration data are found as evaluation criterion and include the least model of free parameter (being determined by p, d, q parameter).
The model are as follows:
At=φ1At-1+φ2At-2+…+φpAt-p+δ+ut+θ1ut-1+θ2ut-2+…+θqut-q
Wherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is
Error coefficient, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate prediction error
Lag number;
Stage one:
According to data, calculate ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern, according to ACF and
PACF figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable
The sequence of non-stationary is obtained the sequence of stationarity by difference as needed by sequence.
Stage two:
Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:
P: the lag number (1ags) of the time series data used in prediction model itself, also known as autoregression item;
D: the difference number for needing to carry out, also known as Difference Terms;
Q: the lag number (lags) of error, also known as rolling average item are predicted;
Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。
Stage three:
The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is monitored by calculating
It is worth the Euclidean distance of predicted value and is compared with the threshold value manually set therefore, it is determined that whether being abnormal point.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and
In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (10)
- The recognition methods 1. a kind of pollution sources on-line monitoring data are practised fraud, which is characterized in that this method comprises:1) data prediction: on-line monitoring data are pre-processed, select hourly value data as the basis of unalterable rules screening Data, and invalid data is handled, if the data that processing rule is the detection of certain time period flow instrument are zero, reject the period The monitoring index data of all monitoring instruments;Judge the percentage that the monitoring index data of monitoring instrument are zero, if being less than threshold value, Otherwise this monitoring index data are rejected in the filling for then carrying out data;2) unalterable rules screening: including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit exception rules Library;Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilute, go together The analysis of industry discharge amount, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark be out of order instrument and have failure The enterprise of instrument, including zero and constant;The O&M unit exception rules library is for judging whether O&M unit Quality Control record plays tricks, and it is different to mark out O&M unit Normal enterprise;The result of unalterable rules screening is shown with visual means;3) video gate inhibition: recording including video monitoring and gate inhibition, and video monitoring includes pollutant discharge of enterprise mouth video monitoring and station view Frequency monitors, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition has two Kind application form, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of cheating, instrument Device failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if monitoring To sewage draining exit water turbidity, sewage draining exit someone is close or station has the case where unauthorized person is swarmed into, and exports warning information;4) on-site examination: on-site examination personnel according to unalterable rules screening results and video access information, in conjunction with company information into Row on-site examination, company information include enterprise's on-line monitoring data, O&M record etc., and on-site examination can generate enterprise's on-line monitoring With the presence or absence of data cheating, instrument, whether failure, O&M unit Quality Control record fake three kinds of result datas, and output result is simultaneously As mark information for correcting machine learning relevant parameter, optimize unalterable rules screening method, to obtain higher Accuracy;5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with the side of machine learning Formula continues to optimize unalterable rules screening, forms the higher screening rule of confidence level, the mode of the machine learning is non-supervisory Study and semi-supervised learning combine or time series analysis TSA, is set according to actual conditions during machine learning suitable Close the threshold value of specific requirements.
- The recognition methods 2. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that the nothing The processing rule of data is imitated specifically, carrying out data using interpolation method if the monitoring index data only lower than 10% are zero Filling;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
- The recognition methods 3. a kind of pollution sources on-line monitoring data according to claim 2 are practised fraud, which is characterized in that enterprise makees In disadvantage rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules do not do whether monitoring index data are zero Judgement, the data of carry out flow instrument detection whether the judgement for being zero.
- The recognition methods 4. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described Dilution in enterprise's cheating rule base specifically:1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;It is right for these fraudulent means There is the enterprise of two and the above monitoring index to carry out on-line monitoring Data Synchronization Analysis, if there is two and the above monitoring index The case where increasing or reduce in proportion is then labeled as cheating suspicion enterprise;Specifically includes the following steps:(1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index, and number is monitored online Regular screening is diluted according to doing.(2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively A when Nn, Bn, Cn;
Time a b c When N An Bn Cn When N+1 An+1 Bn+1 Cn+1 If meeting following relationship simultaneously:An、Bn、CnOne of them is more than discharge mark Quasi- 70%.It (3) is cheating suspicion enterprise by result queue.2) the discharge amount analysis of the same trade specifically:These factors of industry, discharge standard limitation, enterprise production and waste water treatment process that blowdown enterprise is related to are made For screening conditions, same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is sorted out, this Class enterprise is considered similar enterprise.Exception is found out with the concentration analogy analysis of monitoring item according to the wastewater discharge of similar enterprise It is cheating suspicion enterprise by the business sign in the enterprise of average level.3) the monitoring index Cooperative Analysis includes: the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and COD With total organic carbon Cooperative Analysis.(a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are the feelings of this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data Condition is further analyzed ammonia nitrogen concentration greater than the case where total nitrogen concentration.(1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.(2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)Time a b When N A1 B1 Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.It (3) is cheating suspicion enterprise by result queue.(b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, wherein the data energy of P and Q Enough to obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.So another number can be released with relational expression according to any one numerical value in COD or total organic carbon Value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.COD is calculated according to x1 Calculated value y2, provided according to standard, the error of the assay value both in enterprise's waste discharge is allowed within 10%, i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.4) exceeded bust and the exceeded bust of interruption, the exceeded bust include exceeded analysis and neighbouring exceeded analysis, exceeded analysis Or neighbouring exceeded analysis refers to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the monitoring refers to Mark numerical value declines suddenly, then is further analyzed to this index;Specifically includes the following steps:Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn;Time a When N An When N+1 An+1 When N+2 An+2 When N+3 An+3 The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOr PersonIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, AnNumerical value be more than it is exceeded Line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Similarly belong to abnormal number According to.If the case where meeting the first or second, it is labeled as exceeded bust.Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N+7's of exceeded analysis Data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.Result above is labeled as cheating suspicion enterprise.5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:A) exceeded bust+monitoring index Cooperative AnalysisIt for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD and total Organic Carbon TOC does not meet correlativity, meets two kinds of cheating data characteristicses simultaneously for this data, cheating suspicion increases, mark It is denoted as cheating suspicion enterprise.B) exceeded bust+constantFor finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise's instrument Constant feature in diagnosis rule library, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, and is labeled as cheating suspicion Enterprise.6) spatial pattern and process is a kind of method for analyzing exceptional value, is determined as if one group of data deviation average is far Dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.Rule: by data according to arranging from big to small, then the data for being likely to occur exceptional value frequently appear in maximum value or minimum In the data of value.(i) for the data (each Monitoring factors be a data) containing n hourly value, calculate the data it is each when The statistic G, the statistic G of i-th of hourly value of mean valueiIt may be expressed as:Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;(ii) Grubbs coefficient is searchedCorresponding critical quantity in Grubbs coefficient table is searched according to statistic G;(iii) exceptional value is found outWork as xiMaximum value or minimum value corresponding statistic G when being greater than critical quantity, then it is assumed that corresponding maximum value or most Small value is doubtful exceptional value;It (iv) is cheating suspicion enterprise by result queue. - The recognition methods 5. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described The zero and constant of enterprise's instrument failure rule base be specially;A) zeroOn-line monitoring monitoring index is all zero within flow number continuous 24 hours, is labeled as instrument failure suspicion.B) constantOn-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, is labeled as instrument failure suspicion.
- The recognition methods 6. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described O&M unit exception rules refer to that the Quality Control record that O&M unit retains and several instrument upload data of adopting are inconsistent, in this case It is further analyzed according to following rule;Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met | M-N |/ N >=30% is played tricks labeled as O&M unit Quality Control record.
- The recognition methods 7. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that by connecing Enter video gate inhibition, with the on-line monitoring data binding analysis for being labeled as cheating suspicion enterprise, video gate inhibition can be according to abnormal data Time point finds out the video data in corresponding time range, analyzes its cheating.
- The recognition methods 8. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described existing Field inspection is the cheating suspicion enterprise for being marked by unalterable rules screening and video gate inhibition's binding analysis, and related personnel goes Enterprise's field verification situation, and obtain evidence;Based on the abnormal data that unalterable rules screening is found out, pass through on-site verification instrument Device whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation determine whether the enterprise practises fraud.
- The recognition methods 9. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described non- The data cheating identification method based on machine learning that supervised learning and semi-supervised learning combine, refers in the initial state, In the case where there is no mark information, it can only go to distinguish the point far from totality using non-supervisory clustering method, when acquisition fraction These reliable test values can be made full use of to obtain preferably as a result, using clustering method after reliable artificial detection result It is specific to walk with the method that ADOA (Anomaly Detection with partial Observed Anomalies) is combined Suddenly include:(1) unmarked initial stage, using the method for non-supervisory cluster, this method be chosen as k-mean algorithm based on distance or DBSCAN of the person based on density.(2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA is that have a large amount of unlabelled samples This, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types; ADOA algorithm is divided into two stages:Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score) Sample.Wherein:(a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample, isolated In forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability size that sample x is abnormal point (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample path lengths Mean value.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n)=2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated score IS (x) can be indicated are as follows:IS (x) illustrates that this sample more may be exceptional sample closer to 1;(b) similar score:It obviously more may be potential different closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample) Chang Dian, so that similar score SS (x) can indicate are as follows:Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;(c) total score: need to consider to isolate simultaneously score and similar to filter out potential abnormal point and believable normal point Score, integrating total score TS (x) can indicate are as follows:TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]As TS (x) >=α, it can determine that the sample is potential abnormal point;As TS (x)≤β, it can determine that the sample is credible normal point.Threshold alpha herein, β can sensitivity according to actual needs set.Stage two:Corresponding weight is set to each sample first, particularly, the weight of the abnormal marking sample manually obtained is set Be 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:And for credible normal point, the lower weight of TS (x) should be bigger:Problem is become into (k+1)-classification problem, minimizing optimization object is:Wherein ωiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is regular terms Coefficient.This problem can be solved with more classification SVM.
- The recognition methods 10. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described Time series analysis TSA analysis method, referring to each monitoring data, there are certain periodicity in timing, can use previous The rule of development of data carries out every monitoring data of prediction successive time point, therefore, it is determined that whether be abnormal data, it is specific to walk Suddenly include:Using difference ARMA model (ARIMA), using AIC (Akaike Information Criterion) as Evaluation criterion finds best illustration data and includes the least model of free parameter (being determined by p, d, q parameter).The model are as follows:At=φ1At-1+φ2At-2+…+φpAt-p+δ+ut+θ1ut-1+θ2ut-2+…+θqut-qWherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is error system Number, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate the lag number of prediction error;Stage one:According to data, ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern are calculated, according to ACF and PACF Figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable sequence, The sequence of non-stationary is obtained into the sequence of stationarity by difference as needed.Stage two:Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:P: the lag number (lags) of the time series data used in prediction model itself, also known as autoregression item;D: the difference number for needing to carry out, also known as Difference Terms;Q: the lag number (lags) of error, also known as rolling average item are predicted;Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。Stage three:The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is arrived by calculating monitor value Whether the Euclidean distance of predicted value is simultaneously compared with the threshold value manually set therefore, it is determined that being abnormal point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910591968.9A CN110245880A (en) | 2019-07-02 | 2019-07-02 | A kind of pollution sources on-line monitoring data cheating recognition methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910591968.9A CN110245880A (en) | 2019-07-02 | 2019-07-02 | A kind of pollution sources on-line monitoring data cheating recognition methods |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110245880A true CN110245880A (en) | 2019-09-17 |
Family
ID=67890724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910591968.9A Pending CN110245880A (en) | 2019-07-02 | 2019-07-02 | A kind of pollution sources on-line monitoring data cheating recognition methods |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245880A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889088A (en) * | 2019-11-04 | 2020-03-17 | 国网浙江省电力有限公司信息通信分公司 | Enterprise pollution discharge supervision method assisted by electric model |
CN110990393A (en) * | 2019-12-17 | 2020-04-10 | 清华苏州环境创新研究院 | Big data identification method for abnormal data behaviors of industry enterprises |
CN111680856A (en) * | 2020-01-14 | 2020-09-18 | 国家电网有限公司 | User behavior safety early warning method and system for power monitoring system |
CN112258689A (en) * | 2020-10-26 | 2021-01-22 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship data processing method and device and ship data quality management platform |
CN112381697A (en) * | 2020-11-20 | 2021-02-19 | 深圳衡伟环境技术有限公司 | Method for automatically identifying false behavior of water pollution source on-line monitoring data |
CN112699113A (en) * | 2021-01-12 | 2021-04-23 | 上海交通大学 | Industrial manufacturing process operation monitoring system driven by time sequence data stream |
CN113012388A (en) * | 2021-02-19 | 2021-06-22 | 浙江清之元信息科技有限公司 | Pollution source online monitoring system and online monitoring data false identification analysis method |
CN113655189A (en) * | 2021-03-31 | 2021-11-16 | 吴超烽 | Automatic monitoring data analysis and judgment system for pollution source |
CN113705547A (en) * | 2021-10-28 | 2021-11-26 | 北京万维盈创科技发展有限公司 | Dynamic management and control method and device for recognizing false behavior of environment blurring |
CN114580804A (en) * | 2020-12-01 | 2022-06-03 | 武汉斗鱼网络科技有限公司 | Method for determining suspected risk user and related equipment |
CN117407661A (en) * | 2023-12-14 | 2024-01-16 | 深圳前海慧联科技发展有限公司 | Data enhancement method for equipment state detection |
CN118313564A (en) * | 2024-06-05 | 2024-07-09 | 生态环境部环境工程评估中心 | Abnormality identification method, device, equipment and medium for enterprise emission monitoring data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102355381A (en) * | 2011-08-18 | 2012-02-15 | 网宿科技股份有限公司 | Method and system for predicting flow of self-adaptive differential auto-regression moving average model |
CN104808622A (en) * | 2015-03-18 | 2015-07-29 | 武汉巨正环保科技有限公司 | Intelligent type one-stop pollution source online monitoring system |
CN106156269A (en) * | 2016-06-01 | 2016-11-23 | 国网河北省电力公司电力科学研究院 | One is opposed electricity-stealing precise positioning on-line monitoring method |
CN106709242A (en) * | 2016-12-07 | 2017-05-24 | 常州大学 | Method for identifying authenticity of sewage monitoring data |
CN107758885A (en) * | 2017-11-01 | 2018-03-06 | 浙江成功软件开发有限公司 | A kind of real-time sewage is aerated condition monitoring method |
CN108763966A (en) * | 2018-06-04 | 2018-11-06 | 武汉邦拓信息科技有限公司 | A kind of Tail gas measuring cheating supervisory systems and method |
CN109614526A (en) * | 2018-11-09 | 2019-04-12 | 环境保护部环境工程评估中心 | Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model |
-
2019
- 2019-07-02 CN CN201910591968.9A patent/CN110245880A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102355381A (en) * | 2011-08-18 | 2012-02-15 | 网宿科技股份有限公司 | Method and system for predicting flow of self-adaptive differential auto-regression moving average model |
CN104808622A (en) * | 2015-03-18 | 2015-07-29 | 武汉巨正环保科技有限公司 | Intelligent type one-stop pollution source online monitoring system |
CN106156269A (en) * | 2016-06-01 | 2016-11-23 | 国网河北省电力公司电力科学研究院 | One is opposed electricity-stealing precise positioning on-line monitoring method |
CN106709242A (en) * | 2016-12-07 | 2017-05-24 | 常州大学 | Method for identifying authenticity of sewage monitoring data |
CN107758885A (en) * | 2017-11-01 | 2018-03-06 | 浙江成功软件开发有限公司 | A kind of real-time sewage is aerated condition monitoring method |
CN108763966A (en) * | 2018-06-04 | 2018-11-06 | 武汉邦拓信息科技有限公司 | A kind of Tail gas measuring cheating supervisory systems and method |
CN109614526A (en) * | 2018-11-09 | 2019-04-12 | 环境保护部环境工程评估中心 | Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model |
Non-Patent Citations (1)
Title |
---|
YA-LIN ZHANG等: "Anomaly Detection with Partially Observed Anomalies", 《COMPANION OF THE THE WEB CONFERENCE CONFERENCE 2018. INTERNATIONAL WORLD WIDE WEB CONFERENCES 2018 ON THE WEB STEERING COMMITTEE》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889088A (en) * | 2019-11-04 | 2020-03-17 | 国网浙江省电力有限公司信息通信分公司 | Enterprise pollution discharge supervision method assisted by electric model |
CN110889088B (en) * | 2019-11-04 | 2023-10-20 | 国网浙江省电力有限公司信息通信分公司 | Enterprise pollution discharge supervision method assisted by electric power model |
CN110990393A (en) * | 2019-12-17 | 2020-04-10 | 清华苏州环境创新研究院 | Big data identification method for abnormal data behaviors of industry enterprises |
CN110990393B (en) * | 2019-12-17 | 2023-09-08 | 清华苏州环境创新研究院 | Big data identification method for abnormal behaviors of industry enterprise data |
CN111680856A (en) * | 2020-01-14 | 2020-09-18 | 国家电网有限公司 | User behavior safety early warning method and system for power monitoring system |
CN112258689A (en) * | 2020-10-26 | 2021-01-22 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship data processing method and device and ship data quality management platform |
CN112258689B (en) * | 2020-10-26 | 2022-12-13 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship data processing method and device and ship data quality management platform |
CN112381697A (en) * | 2020-11-20 | 2021-02-19 | 深圳衡伟环境技术有限公司 | Method for automatically identifying false behavior of water pollution source on-line monitoring data |
CN112381697B (en) * | 2020-11-20 | 2024-02-02 | 深圳衡伟环境技术有限公司 | Automatic recognition method for false behavior of water pollution source on-line monitoring data falsification |
CN114580804A (en) * | 2020-12-01 | 2022-06-03 | 武汉斗鱼网络科技有限公司 | Method for determining suspected risk user and related equipment |
CN112699113A (en) * | 2021-01-12 | 2021-04-23 | 上海交通大学 | Industrial manufacturing process operation monitoring system driven by time sequence data stream |
CN113012388B (en) * | 2021-02-19 | 2023-02-24 | 浙江清之元信息科技有限公司 | Pollution source online monitoring system and online monitoring data false identification analysis method |
CN113012388A (en) * | 2021-02-19 | 2021-06-22 | 浙江清之元信息科技有限公司 | Pollution source online monitoring system and online monitoring data false identification analysis method |
CN113655189A (en) * | 2021-03-31 | 2021-11-16 | 吴超烽 | Automatic monitoring data analysis and judgment system for pollution source |
CN113705547B (en) * | 2021-10-28 | 2022-03-25 | 北京万维盈创科技发展有限公司 | Dynamic management and control method and device for recognizing false behavior of environment blurring |
CN113705547A (en) * | 2021-10-28 | 2021-11-26 | 北京万维盈创科技发展有限公司 | Dynamic management and control method and device for recognizing false behavior of environment blurring |
CN117407661A (en) * | 2023-12-14 | 2024-01-16 | 深圳前海慧联科技发展有限公司 | Data enhancement method for equipment state detection |
CN117407661B (en) * | 2023-12-14 | 2024-02-27 | 深圳前海慧联科技发展有限公司 | Data enhancement method for equipment state detection |
CN118313564A (en) * | 2024-06-05 | 2024-07-09 | 生态环境部环境工程评估中心 | Abnormality identification method, device, equipment and medium for enterprise emission monitoring data |
CN118313564B (en) * | 2024-06-05 | 2024-08-23 | 生态环境部环境工程评估中心 | Abnormality identification method, device, equipment and medium for enterprise emission monitoring data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245880A (en) | A kind of pollution sources on-line monitoring data cheating recognition methods | |
CN111475804B (en) | Alarm prediction method and system | |
WO2019019709A1 (en) | Method for detecting water leakage of tap water pipe | |
CN110636066B (en) | Network security threat situation assessment method based on unsupervised generative reasoning | |
CN106330949B (en) | One kind being based on markovian intrusion detection method | |
CN112288021A (en) | Medical wastewater monitoring data quality control method, device and system | |
CN109034140A (en) | Industrial control network abnormal signal detection method based on deep learning structure | |
CN115277180B (en) | Block chain log anomaly detection and tracing system | |
CN112633779B (en) | Method for evaluating reliability of environmental monitoring data | |
CN114422184A (en) | Network security attack type and threat level prediction method based on machine learning | |
CN114049134A (en) | Pollution source online monitoring data counterfeiting identification method | |
Qian et al. | Deep learning based anomaly detection in water distribution systems | |
CN115062851B (en) | Pollution discharge abnormality monitoring method and system based on multi-algorithm fusion | |
CN117935519B (en) | Gas detection alarm system | |
CN110011990A (en) | Intranet security threatens intelligent analysis method | |
CN114997313A (en) | Anomaly detection method for ocean online monitoring data | |
CN115883163A (en) | Network safety alarm monitoring method | |
CN115277159A (en) | Industrial Internet security situation assessment method based on improved random forest | |
CN117892094A (en) | Sewage operation and maintenance platform big data analysis system | |
Yang et al. | Teacher–Student Uncertainty Autoencoder for the Process-Relevant and Quality-Relevant Fault Detection in the Industrial Process | |
CN111191855A (en) | Water quality abnormal event identification and early warning method based on pipe network multi-element water quality time sequence data | |
CN117094563B (en) | Intelligent liquid waste leakage monitoring system and method based on big data | |
CN114295162A (en) | Environmental monitoring system based on data acquisition | |
CN116910662A (en) | Passenger anomaly identification method and device based on random forest algorithm | |
Salazar et al. | Monitoring approaches for security and safety analysis: application to a load position system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190917 |