CN110245880A - A kind of pollution sources on-line monitoring data cheating recognition methods - Google Patents

A kind of pollution sources on-line monitoring data cheating recognition methods Download PDF

Info

Publication number
CN110245880A
CN110245880A CN201910591968.9A CN201910591968A CN110245880A CN 110245880 A CN110245880 A CN 110245880A CN 201910591968 A CN201910591968 A CN 201910591968A CN 110245880 A CN110245880 A CN 110245880A
Authority
CN
China
Prior art keywords
data
enterprise
exceeded
monitoring
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910591968.9A
Other languages
Chinese (zh)
Inventor
张子健
江洁羽
李文
李科峰
梁思源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHEJIANG SUCCESS SOFTWARE DEVELOPMENT Co Ltd
Original Assignee
ZHEJIANG SUCCESS SOFTWARE DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG SUCCESS SOFTWARE DEVELOPMENT Co Ltd filed Critical ZHEJIANG SUCCESS SOFTWARE DEVELOPMENT Co Ltd
Priority to CN201910591968.9A priority Critical patent/CN110245880A/en
Publication of CN110245880A publication Critical patent/CN110245880A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/00174Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys
    • G07C9/00563Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys using personal physical data of the operator, e.g. finger prints, retinal images, voicepatterns

Abstract

The invention discloses a kind of on-line monitoring data cheating recognition methods of pollution sources, and this method includes data prediction, unalterable rules screening, video gate inhibition, on-site examination and the rule optimization based on machine learning.Wherein, unalterable rules screening, including enterprise's cheating regular screening, enterprise's instrument failure screening and O&M unit exception screening.Video gate inhibition is a kind of tool whether lookup enterprise practises fraud, and video and gate inhibition's alarm can show in system.On-site examination is the field verification to unalterable rules screening results data and video gate inhibition, it can be concluded that enterprise whether practise fraud, whether instrument failure, O&M unit whether the results such as play tricks by O&M record, machine learning is the feedback optimized rule based on on-site examination, so that fixed screening results confidence level is higher.A kind of pollution sources on-line monitoring data cheating recognition methods proposed by the present invention, is able to solve the problems such as effluent exhaust gas is stolen by enterprise, on-line monitoring O&M is lack of standardization, while can assist the Analysis of Policy Making of user.

Description

A kind of pollution sources on-line monitoring data cheating recognition methods
Technical field
The present invention relates to anti-cheating field more particularly to a kind of pollution sources on-line monitoring data cheating identification side is monitored online Method.
Background technique
Environmental quality is the focus of public attention, how preferably to become correlation using available data management pollution sources enterprise The problem of mechanism.The status of cheating anti-for pollution sources can mainly be attributed to three aspects: video monitoring detection process, and work Personnel are by observation data, for example detected value is too large or too small etc. is judged.Currently, cheating data can only be by manually examining Core, experience audit.Even, more situations is to have received the complaint of the common people, and government department superintends and checks according to process, effect It is little.For mass data, cost of labor is very high, and daily each pollution sources enterprise can generate up to a hundred monitoring data, then people Work review efficiency is low.It is monitored in real time using machinery remote, and cannot be guaranteed the reliability of video monitoring, then, for this , often there is certain regularity in the "abnormal" data of manual intervention, find out the rule between "abnormal" data, provide inspection The corresponding decision references of organ, can greatly improve the supervision to illegal enterprise, the strong protection to environment.
Summary of the invention
Present invention aims in view of the deficiencies of the prior art, propose a kind of pollution sources on-line monitoring data cheating identification side Method is able to solve the problems such as effluent exhaust gas is stolen by enterprise, on-line monitoring O&M is lack of standardization, while can assist the decision point of user Analysis.
The purpose of the present invention is achieved through the following technical solutions: a kind of pollution sources on-line monitoring data cheating identification Method, this method comprises:
1) data prediction: on-line monitoring data are pre-processed, select hourly value data as unalterable rules screening Basic data, and invalid data is handled, if the data that processing rule is the detection of certain time period flow instrument are zero, when rejecting this Between all monitoring instruments of section monitoring index data;The percentage that the monitoring index data of monitoring instrument are zero is judged, if being less than Threshold value is then carried out the filling of data using interpolation method, otherwise rejects this monitoring index data;
2) unalterable rules screening: abnormal including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit Rule base;
Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilution, Discharge amount analysis of the same trade, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;
Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark to be out of order and instrument and have The enterprise of failure instrument, including zero and constant;
The O&M unit exception rules library marks out O&M list for judging whether O&M unit Quality Control record plays tricks The abnormal enterprise in position;
The result of unalterable rules screening is shown with visual means;
3) it video gate inhibition: is recorded including video monitoring and gate inhibition, video monitoring includes pollutant discharge of enterprise mouth video monitoring and station Room video monitoring, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition There are two types of application forms, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of work Disadvantage, instrument failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if It monitors that sewage draining exit water turbidity, sewage draining exit someone are close or station has the case where unauthorized person is swarmed into, exports warning information;
4) on-site examination: on-site examination personnel believe according to unalterable rules screening results and video access information in conjunction with enterprise Breath carries out on-site examination, and company information includes enterprise's on-line monitoring data, O&M record etc., and it is online that on-site examination can generate enterprise Monitor whether that there are data to practise fraud, whether instrument whether fake three kinds of result datas by failure, O&M unit Quality Control record, output knot Fruit simultaneously is used to correct machine learning relevant parameter as mark information, optimizes unalterable rules screening method, to obtain more High accuracy;
5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with machine learning Mode unalterable rules screening is continued to optimize, form confidence level higher screening rule, the mode of the machine learning is non- Supervised learning and semi-supervised learning combine or time series analysis TSA, sets according to the actual situation during machine learning Surely it is suitble to the threshold value of specific requirements.
Further, the processing rule of the invalid data is specifically, if the monitoring index data only lower than 10% are Zero, then the filling of data is carried out using interpolation method;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
Further, in enterprise's cheating rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules are not Do monitoring index data whether the judgement for being zero, the data of carry out flow instrument detection whether the judgement for being zero.
Further, the dilution in enterprise cheating rule base specifically:
1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;For these hands of practising fraud Section, carries out on-line monitoring Data Synchronization Analysis to the enterprise for having two and the above monitoring index, supervises if there is two or more It surveys the case where index increases or reduces in proportion and is then labeled as cheating suspicion enterprise;Specifically includes the following steps:
(1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index to supervise online Control data do the regular screening of dilution.
(2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively when N An, Bn, Cn
Time a b c
When N An Bn Cn
When N+1 An+1 Bn+1 Cn+1
If meeting following relationship simultaneously:An、Bn、CnOne of them is more than row Put the 70% of standard.
It (3) is cheating suspicion enterprise by result queue.
2) the discharge amount analysis of the same trade specifically:
Industry that blowdown enterprise is related to, discharge standard limitation, enterprise production and waste water treatment process these because Element is used as screening conditions, and same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is returned Class, this kind of enterprise are considered similar enterprise.It is looked for according to the wastewater discharge of similar enterprise with the concentration analogy analysis of monitoring item It is abnormal in the enterprise of average level out, it is cheating suspicion enterprise by the business sign.
3) the monitoring index Cooperative Analysis includes: that the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and chemistry need Oxygen amount and total organic carbon Cooperative Analysis.
(a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data The case where, ammonia nitrogen concentration is further analyzed greater than the case where total nitrogen concentration.
(1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.
(2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)
Time a b
When N A1 B1
Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.
It (3) is cheating suspicion enterprise by result queue.
(b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.
Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, the wherein number of P and Q According to can obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.
So can be released with relational expression another according to any one numerical value in COD or total organic carbon A numerical value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.Chemical need are calculated according to x1 The calculated value y2 of oxygen amount, provides according to standard, and the error of the assay value both in enterprise's waste discharge is to allow within 10% , i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.
4) exceeded bust and it is interrupted exceeded bust, the exceeded bust includes exceeded analysis and neighbouring exceeded analysis, exceeded Analysis or neighbouring exceeded analysis refer to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the prison It surveys index value to decline suddenly, then this index is further analyzed;Specifically includes the following steps:
Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn
The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOrIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, An's Numerical value is more than exceeded line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Also same Sample belongs to abnormal data.If the case where meeting the first or second, it is labeled as exceeded bust.
Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N of exceeded analysis + 7 data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.
Result above is labeled as cheating suspicion enterprise.
5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:
A) exceeded bust+monitoring index Cooperative Analysis
It for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD Correlativity is not met with total organic carbon TOC, two kinds of cheating data characteristicses are met simultaneously for this data, cheating suspicion increases Add, is labeled as cheating suspicion enterprise.
B) exceeded bust+constant
For finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise Constant feature in instrument failure rule base, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, labeled as cheating Suspicion enterprise.
6) spatial pattern and process is a kind of method for analyzing exceptional value, is sentenced if one group of data deviation average is far It is set to dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.
Rule: by data according to arranging from big to small, then be likely to occur exceptional value data frequently appear in maximum value or In the data of minimum value.
(i) for the data (each Monitoring factors are a data) containing n hourly value, it is every to calculate the data The statistic G, the statistic G of i-th of hourly value of a hourly valueiIt may be expressed as:
Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;
(ii) Grubbs coefficient is searched
Corresponding critical quantity in Grubbs coefficient table is searched according to statistic G;
(iii) exceptional value is found out
Work as xiMaximum value or the corresponding statistic G of minimum value be greater than critical quantity when, then it is assumed that corresponding maximum Value or minimum value are doubtful exceptional values;
It (iv) is cheating suspicion enterprise by result queue.
Further, the zero of enterprise's instrument failure rule base and constant are specially;
A) zero
On-line monitoring monitoring index is all zero within flow number continuous 24 hours, is disliked labeled as instrument failure It doubts.
B) constant
On-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, dislikes labeled as instrument failure It doubts.
Further, the O&M unit exception rules refer to that the Quality Control record that O&M unit retains adopts instrument upload with number Data are inconsistent, are further analyzed in this case according to following rule;
Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met | M-N |/N >=30% is played tricks labeled as O&M unit Quality Control record.
Further, by access video gate inhibition, be labeled as cheating suspicion enterprise on-line monitoring data binding analysis, Video gate inhibition can find out the video data in corresponding time range according to abnormal data time point, analyze its cheating.
Further, the on-site examination is for being marked by unalterable rules screening and video gate inhibition's binding analysis Cheating suspicion enterprise, related personnel goes enterprise's field verification situation, and obtains evidence;It is found out based on unalterable rules screening Abnormal data, by on-site verification instrument whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation Determine whether the enterprise practises fraud.
Further, the data cheating identification based on machine learning that the unsupervised learning and semi-supervised learning combine Mode refers in the initial state, without mark information in the case where, can only go to distinguish using non-supervisory clustering method separate Overall point can make full use of these reliable test values to obtain more after acquisition fraction reliable artificial detection result It is good as a result, using clustering method and ADOA (Anomaly Detection with partial Observed Anomalies the method) combined, specific steps include:
(1) unmarked initial stage, using the method for non-supervisory cluster, this method is chosen as the calculation of the k-mean based on distance Method or DBSCAN based on density.
(2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA be have it is a large amount of unmarked Sample, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types 's;ADOA algorithm is divided into two stages:
Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score) Sample.Wherein:
(a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample, In isolated forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability that sample x is abnormal point Size (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample arm path lengths The mean value of degree.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n) =2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated Score IS (x) can be indicated are as follows:
IS (x) illustrates that this sample more may be exceptional sample closer to 1;
(b) similar score:
It obviously more may be potential closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample) Abnormal point, so that similar score SS (x) can indicate are as follows:
Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;
(c) total score: in order to filter out potential abnormal point and believable normal point need to consider simultaneously isolated score with Similar score, integrating total score TS (x) can indicate are as follows:
TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]
As TS (x) >=α, it can determine that the sample is potential abnormal point;
As TS (x)≤β, it can determine that the sample is credible normal point.
Threshold alpha herein, β can sensitivity according to actual needs set.
Stage two:
Corresponding weight, particularly, the weight quilt of the abnormal marking sample manually obtained are set to each sample first Be set as 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:
And for credible normal point, the lower weight of TS (x) should be bigger:
Problem is become into (k+1)-classification problem, minimizing optimization object is:
Wherein wiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is canonical Term coefficient.This problem can be solved with more classification SVM.
Further, it is certain to refer to that each monitoring data exist in timing for the time series analysis TSA analysis method Periodicity, can use the rule of development of previous data, carry out prediction successive time point every monitoring data, therefore, it is determined that It whether is abnormal data, specific steps include:
Using difference ARMA model (ARIMA), with AIC (Akaike Information Criterion) Best illustration data are found as evaluation criterion and include the least model of free parameter (being determined by p, d, q parameter).
The model are as follows:
At1At-12At-2+…+φpAt-p+δ+ut1ut-12ut-2+…+θqut-q
Wherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is Error coefficient, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate prediction error Lag number;
Stage one:
According to data, calculate ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern, according to ACF and PACF figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable The sequence of non-stationary is obtained the sequence of stationarity by difference as needed by sequence.
Stage two:
Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:
P: the lag number (lags) of the time series data used in prediction model itself, also known as autoregression item;
D: the difference number for needing to carry out, also known as Difference Terms;
Q: the lag number (lags) of error, also known as rolling average item are predicted;
Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。
Stage three:
The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is monitored by calculating It is worth the Euclidean distance of predicted value and is compared with the threshold value manually set therefore, it is determined that whether being abnormal point.
Beneficial effects of the present invention: business is monitored online in further improvement and optimization, effectively supervises the discharge of enterprise wastewater exhaust gas, On existing on-line monitoring detection big data fundamental analysis, data prison is carried out to on-line monitoring detection line and detection device It surveys, analyze and handles, realize and the applications such as anti-cheating early warning, decision assistant analysis are carried out to the on-line monitoring data of environmental protection information, Strong monitoring of the environmental protection administration to on-line monitoring is greatly improved, to realize wisdom environmental protection.The actual conditions of combining environmental monitoring, It, as cold start-up, is used after with certain abnormal marking information semi-supervised using first with unsupervised clustering algorithm Study adjustment model accuracy, to more accurately find the abnormal data in monitoring data, the present invention is calculated using ARIMA Method, the periodicity of mining data, so that the case where system can find artificial manufaturing data to a certain extent.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The specific embodiment of the invention is described in further detail below in conjunction with attached drawing.
The recognition methods as shown in Figure 1, a kind of pollution sources on-line monitoring data provided by the invention are practised fraud, this method comprises:
1) data prediction: on-line monitoring data are pre-processed, select hourly value data as unalterable rules screening Basic data, and invalid data is handled, if it is zero that processing rule, which is the data that certain time period flow instrument detects, rejecting should Period all monitoring index data;Judge the percentage that monitoring index data are zero, if being less than threshold value, utilizes interpolation method (such as Newton interpolating method) carries out the filling of data, otherwise rejects this monitoring index data;
2) unalterable rules screening: abnormal including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit Rule base;
Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilution, Discharge amount analysis of the same trade, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;
Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark to be out of order and instrument and have The enterprise of failure instrument, including zero and constant;
The O&M unit exception rules library marks out O&M list for judging whether O&M unit Quality Control record plays tricks The abnormal enterprise in position;
The result of unalterable rules screening is shown with visual means;
3) it video gate inhibition: is recorded including video monitoring and gate inhibition, video monitoring includes pollutant discharge of enterprise mouth video monitoring and station Room video monitoring, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition There are two types of application forms, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of work Disadvantage, instrument failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if It monitors that sewage draining exit water turbidity, sewage draining exit someone are close or station has the case where unauthorized person is swarmed into, exports warning information;
4) on-site examination: on-site examination personnel believe according to unalterable rules screening results and video access information in conjunction with enterprise Breath carries out on-site examination, and company information includes enterprise's on-line monitoring data, O&M record etc., and it is online that on-site examination can generate enterprise Monitor whether that there are data to practise fraud, whether instrument whether fake three kinds of result datas by failure, O&M unit Quality Control record, output knot Fruit simultaneously is used to correct machine learning relevant parameter as mark information, optimizes unalterable rules screening method, to obtain more High accuracy;
5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with machine learning Mode unalterable rules screening is continued to optimize, form confidence level higher screening rule, the mode of the machine learning is non- Supervised learning and semi-supervised learning combine or time series analysis TSA, sets according to the actual situation during machine learning Surely it is suitble to the threshold value of specific requirements.
Further, the processing rule of the invalid data is specifically, if the monitoring index data only lower than 10% are Zero, then the filling of data is carried out using interpolation method;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
Further, in enterprise's cheating rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules are not Do monitoring index data whether the judgement for being zero, the data of carry out flow instrument detection whether the judgement for being zero.
Further, the dilution in enterprise cheating rule base specifically:
1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;For these hands of practising fraud Section, carries out on-line monitoring Data Synchronization Analysis to the enterprise for having two and the above monitoring index, supervises if there is two or more It surveys the case where index increases or reduces in proportion and is then labeled as cheating suspicion enterprise;Specifically includes the following steps:
(1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index to supervise online Control data do the regular screening of dilution.
(2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively when N An, Bn, Cn
Time a b c
When N An Bn Cn
When N+1 An+1 Bn+1 Cn+1
If meeting following relationship simultaneously:An、Bn、CnOne of them is more than row Put the 70% of standard.
It (3) is cheating suspicion enterprise by result queue.
2) the discharge amount analysis of the same trade specifically:
Industry that blowdown enterprise is related to, discharge standard limitation, enterprise production and waste water treatment process these because Element is used as screening conditions, and same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is returned Class, this kind of enterprise are considered similar enterprise.It is looked for according to the wastewater discharge of similar enterprise with the concentration analogy analysis of monitoring item It is abnormal in the enterprise of average level out, it is cheating suspicion enterprise by the business sign.
3) the monitoring index Cooperative Analysis includes: that the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and chemistry need Oxygen amount and total organic carbon Cooperative Analysis.
(a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data The case where, ammonia nitrogen concentration is further analyzed greater than the case where total nitrogen concentration.
(1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.
(2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)
Time a b
When N A1 B1
Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.
It (3) is cheating suspicion enterprise by result queue.
(b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.
Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, the wherein number of P and Q According to can obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.
So can be released with relational expression another according to any one numerical value in COD or total organic carbon A numerical value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.Chemical need are calculated according to x1 The calculated value y2 of oxygen amount, provides according to standard, and the error of the assay value both in enterprise's waste discharge is to allow within 10% , i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.
4) exceeded bust and it is interrupted exceeded bust, the exceeded bust includes exceeded analysis and neighbouring exceeded analysis, exceeded Analysis or neighbouring exceeded analysis refer to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the prison It surveys index value to decline suddenly, then this index is further analyzed;Specifically includes the following steps:
Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn
Time a
When N An
When N+1 An+1
When N+2 An+2
When N+3 An+3
The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOrIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, An's Numerical value is more than exceeded line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Also same Sample belongs to abnormal data.If the case where meeting the first or second, it is labeled as exceeded bust.
Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N of exceeded analysis + 7 data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.
Result above is labeled as cheating suspicion enterprise.
5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:
A) exceeded bust+monitoring index Cooperative Analysis
It for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD Correlativity is not met with total organic carbon TOC, two kinds of cheating data characteristicses are met simultaneously for this data, cheating suspicion increases Add, is labeled as cheating suspicion enterprise.
B) exceeded bust+constant
For finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise Constant feature in instrument failure rule base, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, labeled as cheating Suspicion enterprise.
6) spatial pattern and process is a kind of method for analyzing exceptional value, is sentenced if one group of data deviation average is far It is set to dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.
Rule: by data according to arranging from big to small, then be likely to occur exceptional value data frequently appear in maximum value or In the data of minimum value.
(i) for the data (each Monitoring factors are a data) containing n hourly value, it is every to calculate the data The statistic G, the statistic G of i-th of hourly value of a hourly valueiIt may be expressed as:
Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;
(ii) Grubbs coefficient is searched
Corresponding critical quantity in Grubbs coefficient table is searched according to statistic G;
(iii) exceptional value is found out
Work as xiMaximum value or the corresponding statistic G of minimum value be greater than critical quantity when, then it is assumed that corresponding maximum Value or minimum value are doubtful exceptional values;
It (iv) is cheating suspicion enterprise by result queue.
Further, the zero of enterprise's instrument failure rule base and constant are specially;
A) zero
On-line monitoring monitoring index is all zero within flow number continuous 24 hours, is disliked labeled as instrument failure It doubts.
B) constant
On-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, dislikes labeled as instrument failure It doubts.
Further, the O&M unit exception rules refer to that the Quality Control record that O&M unit retains adopts instrument upload with number Data are inconsistent, are further analyzed in this case according to following rule;
Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met | M-N |/N >=30% is played tricks labeled as O&M unit Quality Control record.
Further, by access video gate inhibition, be labeled as cheating suspicion enterprise on-line monitoring data binding analysis, Video gate inhibition can find out the video data in corresponding time range according to abnormal data time point, analyze its cheating.
Further, the on-site examination is for being marked by unalterable rules screening and video gate inhibition's binding analysis Cheating suspicion enterprise, related personnel goes enterprise's field verification situation, and obtains evidence;It is found out based on unalterable rules screening Abnormal data, by on-site verification instrument whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation Determine whether the enterprise practises fraud.
Further, the data cheating identification based on machine learning that the unsupervised learning and semi-supervised learning combine Mode refers in the initial state, without mark information in the case where, can only go to distinguish using non-supervisory clustering method separate Overall point can make full use of these reliable test values to obtain more after acquisition fraction reliable artificial detection result It is good as a result, using clustering method and ADOA (Anomaly Detection with partial Observed Anomalies the method) combined, specific steps include:
(1) unmarked initial stage, using the method for non-supervisory cluster, this method is chosen as the calculation of the k-mean based on distance Method or DBSCAN based on density.
(2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA be have it is a large amount of unmarked Sample, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types 's;ADOA algorithm is divided into two stages:
Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score) Sample.Wherein:
(a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample, In isolated forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability that sample x is abnormal point Size (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample arm path lengths The mean value of degree.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n) =2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated Score IS (x) can be indicated are as follows:
IS (x) illustrates that this sample more may be exceptional sample closer to 1;
(b) similar score:
It obviously more may be potential closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample) Abnormal point, so that similar score SS (x) can indicate are as follows:
Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;
(c) total score: in order to filter out potential abnormal point and believable normal point need to consider simultaneously isolated score with Similar score, integrating total score TS (x) can indicate are as follows:
TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]
As TS (x) >=α, it can determine that the sample is potential abnormal point;
As TS (x)≤β, it can determine that the sample is credible normal point.
Threshold alpha herein, β can sensitivity according to actual needs set.
Stage two:
Corresponding weight, particularly, the weight quilt of the abnormal marking sample manually obtained are set to each sample first Be set as 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:
And for credible normal point, the lower weight of TS (x) should be bigger:
Problem is become into (k+1)-classification problem, minimizing optimization object is:
Wherein wiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is canonical Term coefficient.This problem can be solved with more classification SVM.
Further, it is certain to refer to that each monitoring data exist in timing for the time series analysis TSA analysis method Periodicity, can use the rule of development of previous data, carry out prediction successive time point every monitoring data, therefore, it is determined that It whether is abnormal data, specific steps include:
Using difference ARMA model (ARIMA), with AIC (Akaike Information Criterion) Best illustration data are found as evaluation criterion and include the least model of free parameter (being determined by p, d, q parameter).
The model are as follows:
At1At-12At-2+…+φpAt-p+δ+ut1ut-12ut-2+…+θqut-q
Wherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is Error coefficient, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate prediction error Lag number;
Stage one:
According to data, calculate ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern, according to ACF and PACF figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable The sequence of non-stationary is obtained the sequence of stationarity by difference as needed by sequence.
Stage two:
Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:
P: the lag number (1ags) of the time series data used in prediction model itself, also known as autoregression item;
D: the difference number for needing to carry out, also known as Difference Terms;
Q: the lag number (lags) of error, also known as rolling average item are predicted;
Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。
Stage three:
The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is monitored by calculating It is worth the Euclidean distance of predicted value and is compared with the threshold value manually set therefore, it is determined that whether being abnormal point.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.

Claims (10)

  1. The recognition methods 1. a kind of pollution sources on-line monitoring data are practised fraud, which is characterized in that this method comprises:
    1) data prediction: on-line monitoring data are pre-processed, select hourly value data as the basis of unalterable rules screening Data, and invalid data is handled, if the data that processing rule is the detection of certain time period flow instrument are zero, reject the period The monitoring index data of all monitoring instruments;Judge the percentage that the monitoring index data of monitoring instrument are zero, if being less than threshold value, Otherwise this monitoring index data are rejected in the filling for then carrying out data;
    2) unalterable rules screening: including enterprise's cheating rule base, enterprise's instrument failure rule base and O&M unit exception rules Library;
    Enterprise's cheating rule base is used to judge whether enterprise practises fraud, and marks out cheating suspicion enterprise, including dilute, go together The analysis of industry discharge amount, monitoring index Cooperative Analysis, exceeded bust, the exceeded bust of interruption, combinatory analysis, spatial pattern and process;
    Enterprise's instrument failure rule base for judge enterprise's instrument whether failure, and mark be out of order instrument and have failure The enterprise of instrument, including zero and constant;
    The O&M unit exception rules library is for judging whether O&M unit Quality Control record plays tricks, and it is different to mark out O&M unit Normal enterprise;
    The result of unalterable rules screening is shown with visual means;
    3) video gate inhibition: recording including video monitoring and gate inhibition, and video monitoring includes pollutant discharge of enterprise mouth video monitoring and station view Frequency monitors, for monitoring the unlawful practice of enterprise personnel;Gate inhibition's record refers to that personnel enter and leave the record of station;Video gate inhibition has two Kind application form, the first is that video gate inhibition combines unalterable rules screening results, further confirms that enterprise with the presence or absence of cheating, instrument Device failure or O&M abnormal conditions;Second is video gate inhibition's early warning, that is, checks the video monitoring of sewage draining exit, station, if monitoring To sewage draining exit water turbidity, sewage draining exit someone is close or station has the case where unauthorized person is swarmed into, and exports warning information;
    4) on-site examination: on-site examination personnel according to unalterable rules screening results and video access information, in conjunction with company information into Row on-site examination, company information include enterprise's on-line monitoring data, O&M record etc., and on-site examination can generate enterprise's on-line monitoring With the presence or absence of data cheating, instrument, whether failure, O&M unit Quality Control record fake three kinds of result datas, and output result is simultaneously As mark information for correcting machine learning relevant parameter, optimize unalterable rules screening method, to obtain higher Accuracy;
    5) based on the rule optimization of machine learning: according to the feedback information of video gate inhibition and on-site examination, with the side of machine learning Formula continues to optimize unalterable rules screening, forms the higher screening rule of confidence level, the mode of the machine learning is non-supervisory Study and semi-supervised learning combine or time series analysis TSA, is set according to actual conditions during machine learning suitable Close the threshold value of specific requirements.
  2. The recognition methods 2. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that the nothing The processing rule of data is imitated specifically, carrying out data using interpolation method if the monitoring index data only lower than 10% are zero Filling;The item data is rejected if there are the monitoring index data higher than 10% to be zero.
  3. The recognition methods 3. a kind of pollution sources on-line monitoring data according to claim 2 are practised fraud, which is characterized in that enterprise makees In disadvantage rule base, exceeded bust, the exceeded bust of interruption, combinatory analysis this three rules do not do whether monitoring index data are zero Judgement, the data of carry out flow instrument detection whether the judgement for being zero.
  4. The recognition methods 4. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described Dilution in enterprise's cheating rule base specifically:
    1) diluted fraudulent means have in addition setting discharge tube, dilution discharge, dilute sample;It is right for these fraudulent means There is the enterprise of two and the above monitoring index to carry out on-line monitoring Data Synchronization Analysis, if there is two and the above monitoring index The case where increasing or reduce in proportion is then labeled as cheating suspicion enterprise;Specifically includes the following steps:
    (1) to removing other than two Testing index of PH and flow, there is also the enterprises of two and the above monitoring index, and number is monitored online Regular screening is diluted according to doing.
    (2) rule declaration: assuming that a is contained in certain enterprise, tri- kinds of factors of b, c, the data of three monitoring indexes are respectively A when Nn, Bn, Cn
    Time a b c When N An Bn Cn When N+1 An+1 Bn+1 Cn+1
    If meeting following relationship simultaneously:An、Bn、CnOne of them is more than discharge mark Quasi- 70%.
    It (3) is cheating suspicion enterprise by result queue.
    2) the discharge amount analysis of the same trade specifically:
    These factors of industry, discharge standard limitation, enterprise production and waste water treatment process that blowdown enterprise is related to are made For screening conditions, same industry, same discharge standard, yield is identical and the similar enterprise of waste water treatment process is sorted out, this Class enterprise is considered similar enterprise.Exception is found out with the concentration analogy analysis of monitoring item according to the wastewater discharge of similar enterprise It is cheating suspicion enterprise by the business sign in the enterprise of average level.
    3) the monitoring index Cooperative Analysis includes: the total nitrogen being monitored online in data and ammonia nitrogen Cooperative Analysis and COD With total organic carbon Cooperative Analysis.
    (a) total nitrogen and ammonia nitrogen Cooperative Analysis: there are the feelings of this two monitoring indexes of the concentration of total nitrogen and ammonia nitrogen in on-line monitoring data Condition is further analyzed ammonia nitrogen concentration greater than the case where total nitrogen concentration.
    (1) data are rejected: ammonia nitrogen and total nitrogen any value are zero, reject the item data.
    (2) rule declaration: assuming that certain enterprise is contained a (ammonia nitrogen), b (total nitrogen)
    Time a b When N A1 B1
    Wherein, a is ammonia nitrogen, and b is total nitrogen, the concentration of ammonia nitrogen and total nitrogen when A1 and B1 are N, and B1/A1 < 70%.
    It (3) is cheating suspicion enterprise by result queue.
    (b) COD and total organic carbon Cooperative Analysis: COD and total organic carbon have larger correlation.
    Rule: the linear regression relation y=Px+Q of the COD and total organic carbon of enterprise wastewater, wherein the data energy of P and Q Enough to obtain in the corresponding analysis instrument of the on-line monitoring factor, x indicates that total organic carbon, y indicate COD.
    So another number can be released with relational expression according to any one numerical value in COD or total organic carbon Value, it is assumed that sometime put instrument and analyze total organic carbon x1 and COD y1.COD is calculated according to x1 Calculated value y2, provided according to standard, the error of the assay value both in enterprise's waste discharge is allowed within 10%, i.e., | y1-y2 |/y2≤10% is considered abnormal beyond 10%, is then labeled as cheating suspicion enterprise.
    4) exceeded bust and the exceeded bust of interruption, the exceeded bust include exceeded analysis and neighbouring exceeded analysis, exceeded analysis Or neighbouring exceeded analysis refers to that a certain monitoring index of enterprise's on-line monitoring is exceeded or i.e. by exceeded situation, if the monitoring refers to Mark numerical value declines suddenly, then is further analyzed to this index;Specifically includes the following steps:
    Rule declaration: assuming that a factor is contained in certain enterprise, the data of monitoring index are A when Nn
    Time a When N An When N+1 An+1 When N+2 An+2 When N+3 An+3
    The first situation is neighbouring exceeded analysis, AnNumerical value between the 80%-100% of exceeded line, ifOr PersonIt is considered that An、An+1、An+2Belong to abnormal data;Second is exceeded analysis, AnNumerical value be more than it is exceeded Line, ifOrMeet exceeded bust rule, An、An+1、An+2、An+3Similarly belong to abnormal number According to.If the case where meeting the first or second, it is labeled as exceeded bust.
    Being interrupted exceeded bust is further analyzed to the second situation of exceeded bust, record N+2 to the N+7's of exceeded analysis Data do whether exceeded screening, exceeded situation if it exists, labeled as being interrupted exceeded bust.
    Result above is labeled as cheating suspicion enterprise.
    5) combinatory analysis refers to that the multinomial analysis rule of combination is analyzed, comprising:
    A) exceeded bust+monitoring index Cooperative Analysis
    It for the exceeded bust data found out, and is the case where ammonia nitrogen is greater than total nitrogen or COD COD and total Organic Carbon TOC does not meet correlativity, meets two kinds of cheating data characteristicses simultaneously for this data, cheating suspicion increases, mark It is denoted as cheating suspicion enterprise.
    B) exceeded bust+constant
    For finding out the data of exceeded bust in unalterable rules screening, while the data of exceeded bust also have enterprise's instrument Constant feature in diagnosis rule library, it is believed that while meeting two kinds of cheating features, cheating suspicion increases, and is labeled as cheating suspicion Enterprise.
    6) spatial pattern and process is a kind of method for analyzing exceptional value, is determined as if one group of data deviation average is far Dubious value, the exceptional value in monitoring index can be differentiated with Grubbs test method.
    Rule: by data according to arranging from big to small, then the data for being likely to occur exceptional value frequently appear in maximum value or minimum In the data of value.
    (i) for the data (each Monitoring factors be a data) containing n hourly value, calculate the data it is each when The statistic G, the statistic G of i-th of hourly value of mean valueiIt may be expressed as:
    Wherein, i ∈ { 1,2,3 ..., n },Indicate the mean value of n hourly value, s indicates standard deviation, xiIndicate hourly value;
    (ii) Grubbs coefficient is searched
    Corresponding critical quantity in Grubbs coefficient table is searched according to statistic G;
    (iii) exceptional value is found out
    Work as xiMaximum value or minimum value corresponding statistic G when being greater than critical quantity, then it is assumed that corresponding maximum value or most Small value is doubtful exceptional value;
    It (iv) is cheating suspicion enterprise by result queue.
  5. The recognition methods 5. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described The zero and constant of enterprise's instrument failure rule base be specially;
    A) zero
    On-line monitoring monitoring index is all zero within flow number continuous 24 hours, is labeled as instrument failure suspicion.
    B) constant
    On-line monitoring monitoring index remains unchanged within flow number continuous 24 hours, is labeled as instrument failure suspicion.
  6. The recognition methods 6. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described O&M unit exception rules refer to that the Quality Control record that O&M unit retains and several instrument upload data of adopting are inconsistent, in this case It is further analyzed according to following rule;
    Rule: Quality Control sample numerical value is indicated with M in O&M record, and the historical data that number adopts instrument acquisition is denoted as N, if met | M-N |/ N >=30% is played tricks labeled as O&M unit Quality Control record.
  7. The recognition methods 7. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that by connecing Enter video gate inhibition, with the on-line monitoring data binding analysis for being labeled as cheating suspicion enterprise, video gate inhibition can be according to abnormal data Time point finds out the video data in corresponding time range, analyzes its cheating.
  8. The recognition methods 8. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described existing Field inspection is the cheating suspicion enterprise for being marked by unalterable rules screening and video gate inhibition's binding analysis, and related personnel goes Enterprise's field verification situation, and obtain evidence;Based on the abnormal data that unalterable rules screening is found out, pass through on-site verification instrument Device whether normal operation, the historical data of instrument storage, pollutant discharging unit's waste water quality situation determine whether the enterprise practises fraud.
  9. The recognition methods 9. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described non- The data cheating identification method based on machine learning that supervised learning and semi-supervised learning combine, refers in the initial state, In the case where there is no mark information, it can only go to distinguish the point far from totality using non-supervisory clustering method, when acquisition fraction These reliable test values can be made full use of to obtain preferably as a result, using clustering method after reliable artificial detection result It is specific to walk with the method that ADOA (Anomaly Detection with partial Observed Anomalies) is combined Suddenly include:
    (1) unmarked initial stage, using the method for non-supervisory cluster, this method be chosen as k-mean algorithm based on distance or DBSCAN of the person based on density.
    (2) after obtaining certain mark information, ADOA algorithm is used.The usage scenario of ADOA is that have a large amount of unlabelled samples This, the sample that only a small amount of label is, and default abnormal sample not and be single but there are many types; ADOA algorithm is divided into two stages:
    Stage one: the exceptional sample having been observed that is done into a K cluster first, is then based on isolated score (isolation Score) and unlabelled sample is divided into potential exceptional sample and believable normal by similar score (similarity score) Sample.Wherein:
    (a) it isolates score: based on isolated forest (isolation forest), initially setting up the isolated forest of sample, isolated In forest, the sample closer to root node more may be abnormal point.IS (x) is used to describe the probability size that sample x is abnormal point (isolated score).H (x) is enabled to indicate that sample x path length, E (h (x)) in isolated forest indicate all sample path lengths Mean value.Assuming that there is n sample, then the average length of search c (n) that failure is searched in binary search tree is represented by c (n)=2H (n)-(2 (n-1)/n), wherein H (n)=ln (n)+0.5772156649 (Euler's constant) is harmonic progression.To isolated score IS (x) can be indicated are as follows:
    IS (x) illustrates that this sample more may be exceptional sample closer to 1;
    (b) similar score:
    It obviously more may be potential different closer to abnormal conceptual Center (the k aberrant centers clustered by exceptional sample) Chang Dian, so that similar score SS (x) can indicate are as follows:
    Wherein μiI-th of abnormal conceptual Center is represented, k is the quantity of abnormal conceptual Center;
    (c) total score: need to consider to isolate simultaneously score and similar to filter out potential abnormal point and believable normal point Score, integrating total score TS (x) can indicate are as follows:
    TS (x)=θ IS (x)+(1- θ) SS (x), θ ∈ [0,1]
    As TS (x) >=α, it can determine that the sample is potential abnormal point;
    As TS (x)≤β, it can determine that the sample is credible normal point.
    Threshold alpha herein, β can sensitivity according to actual needs set.
    Stage two:
    Corresponding weight is set to each sample first, particularly, the weight of the abnormal marking sample manually obtained is set Be 1, unmarked sample is divided into two classes: for potential abnormal point, the higher weights omega (x) of TS (x) should be bigger:
    And for credible normal point, the lower weight of TS (x) should be bigger:
    Problem is become into (k+1)-classification problem, minimizing optimization object is:
    Wherein ωiIt is sample xiWeight, l (yi, f (xi)) it is sample xiLoss function, R (w) is regular terms, and λ is regular terms Coefficient.This problem can be solved with more classification SVM.
  10. The recognition methods 10. a kind of pollution sources on-line monitoring data according to claim 1 are practised fraud, which is characterized in that described Time series analysis TSA analysis method, referring to each monitoring data, there are certain periodicity in timing, can use previous The rule of development of data carries out every monitoring data of prediction successive time point, therefore, it is determined that whether be abnormal data, it is specific to walk Suddenly include:
    Using difference ARMA model (ARIMA), using AIC (Akaike Information Criterion) as Evaluation criterion finds best illustration data and includes the least model of free parameter (being determined by p, d, q parameter).
    The model are as follows:
    At1At-12At-2+…+φpAt-p+δ+ut1ut-12ut-2+…+θqut-q
    Wherein AtIndicate object value when sequence i, φiIt is auto-correlation coefficient, δ is constant offset item, uiIt is error, θiIt is error system Number, t indicate the moment, and p indicates that the lag number of the time series data itself used in prediction model, q indicate the lag number of prediction error;
    Stage one:
    According to data, ACF (auto-correlation function) and PACF (partial autocorrelation function) and drafting pattern are calculated, according to ACF and PACF Figure checks whether sequence needs to carry out differential conversion, if is periodic data.If obtained sequence is non-stable sequence, The sequence of non-stationary is obtained into the sequence of stationarity by difference as needed.
    Stage two:
    Using AIC as evaluation criterion, grid search optimal model parameter p, d, q are used;Wherein:
    P: the lag number (lags) of the time series data used in prediction model itself, also known as autoregression item;
    D: the difference number for needing to carry out, also known as Difference Terms;
    Q: the lag number (lags) of error, also known as rolling average item are predicted;
    Then using data training, the φ in the parameters i.e. model of model is obtainedi、θi、δ。
    Stage three:
    The index value of prediction successive time point is gone using trained ARIMA model, compares monitor value, is arrived by calculating monitor value Whether the Euclidean distance of predicted value is simultaneously compared with the threshold value manually set therefore, it is determined that being abnormal point.
CN201910591968.9A 2019-07-02 2019-07-02 A kind of pollution sources on-line monitoring data cheating recognition methods Pending CN110245880A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910591968.9A CN110245880A (en) 2019-07-02 2019-07-02 A kind of pollution sources on-line monitoring data cheating recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910591968.9A CN110245880A (en) 2019-07-02 2019-07-02 A kind of pollution sources on-line monitoring data cheating recognition methods

Publications (1)

Publication Number Publication Date
CN110245880A true CN110245880A (en) 2019-09-17

Family

ID=67890724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910591968.9A Pending CN110245880A (en) 2019-07-02 2019-07-02 A kind of pollution sources on-line monitoring data cheating recognition methods

Country Status (1)

Country Link
CN (1) CN110245880A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889088A (en) * 2019-11-04 2020-03-17 国网浙江省电力有限公司信息通信分公司 Enterprise pollution discharge supervision method assisted by electric model
CN110990393A (en) * 2019-12-17 2020-04-10 清华苏州环境创新研究院 Big data identification method for abnormal data behaviors of industry enterprises
CN111680856A (en) * 2020-01-14 2020-09-18 国家电网有限公司 User behavior safety early warning method and system for power monitoring system
CN112258689A (en) * 2020-10-26 2021-01-22 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) Ship data processing method and device and ship data quality management platform
CN112381697A (en) * 2020-11-20 2021-02-19 深圳衡伟环境技术有限公司 Method for automatically identifying false behavior of water pollution source on-line monitoring data
CN112699113A (en) * 2021-01-12 2021-04-23 上海交通大学 Industrial manufacturing process operation monitoring system driven by time sequence data stream
CN113012388A (en) * 2021-02-19 2021-06-22 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113655189A (en) * 2021-03-31 2021-11-16 吴超烽 Automatic monitoring data analysis and judgment system for pollution source
CN113705547A (en) * 2021-10-28 2021-11-26 北京万维盈创科技发展有限公司 Dynamic management and control method and device for recognizing false behavior of environment blurring
CN117407661A (en) * 2023-12-14 2024-01-16 深圳前海慧联科技发展有限公司 Data enhancement method for equipment state detection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355381A (en) * 2011-08-18 2012-02-15 网宿科技股份有限公司 Method and system for predicting flow of self-adaptive differential auto-regression moving average model
CN104808622A (en) * 2015-03-18 2015-07-29 武汉巨正环保科技有限公司 Intelligent type one-stop pollution source online monitoring system
CN106156269A (en) * 2016-06-01 2016-11-23 国网河北省电力公司电力科学研究院 One is opposed electricity-stealing precise positioning on-line monitoring method
CN106709242A (en) * 2016-12-07 2017-05-24 常州大学 Method for identifying authenticity of sewage monitoring data
CN107758885A (en) * 2017-11-01 2018-03-06 浙江成功软件开发有限公司 A kind of real-time sewage is aerated condition monitoring method
CN108763966A (en) * 2018-06-04 2018-11-06 武汉邦拓信息科技有限公司 A kind of Tail gas measuring cheating supervisory systems and method
CN109614526A (en) * 2018-11-09 2019-04-12 环境保护部环境工程评估中心 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355381A (en) * 2011-08-18 2012-02-15 网宿科技股份有限公司 Method and system for predicting flow of self-adaptive differential auto-regression moving average model
CN104808622A (en) * 2015-03-18 2015-07-29 武汉巨正环保科技有限公司 Intelligent type one-stop pollution source online monitoring system
CN106156269A (en) * 2016-06-01 2016-11-23 国网河北省电力公司电力科学研究院 One is opposed electricity-stealing precise positioning on-line monitoring method
CN106709242A (en) * 2016-12-07 2017-05-24 常州大学 Method for identifying authenticity of sewage monitoring data
CN107758885A (en) * 2017-11-01 2018-03-06 浙江成功软件开发有限公司 A kind of real-time sewage is aerated condition monitoring method
CN108763966A (en) * 2018-06-04 2018-11-06 武汉邦拓信息科技有限公司 A kind of Tail gas measuring cheating supervisory systems and method
CN109614526A (en) * 2018-11-09 2019-04-12 环境保护部环境工程评估中心 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YA-LIN ZHANG等: "Anomaly Detection with Partially Observed Anomalies", 《COMPANION OF THE THE WEB CONFERENCE CONFERENCE 2018. INTERNATIONAL WORLD WIDE WEB CONFERENCES 2018 ON THE WEB STEERING COMMITTEE》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889088A (en) * 2019-11-04 2020-03-17 国网浙江省电力有限公司信息通信分公司 Enterprise pollution discharge supervision method assisted by electric model
CN110889088B (en) * 2019-11-04 2023-10-20 国网浙江省电力有限公司信息通信分公司 Enterprise pollution discharge supervision method assisted by electric power model
CN110990393A (en) * 2019-12-17 2020-04-10 清华苏州环境创新研究院 Big data identification method for abnormal data behaviors of industry enterprises
CN110990393B (en) * 2019-12-17 2023-09-08 清华苏州环境创新研究院 Big data identification method for abnormal behaviors of industry enterprise data
CN111680856A (en) * 2020-01-14 2020-09-18 国家电网有限公司 User behavior safety early warning method and system for power monitoring system
CN112258689B (en) * 2020-10-26 2022-12-13 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) Ship data processing method and device and ship data quality management platform
CN112258689A (en) * 2020-10-26 2021-01-22 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) Ship data processing method and device and ship data quality management platform
CN112381697A (en) * 2020-11-20 2021-02-19 深圳衡伟环境技术有限公司 Method for automatically identifying false behavior of water pollution source on-line monitoring data
CN112381697B (en) * 2020-11-20 2024-02-02 深圳衡伟环境技术有限公司 Automatic recognition method for false behavior of water pollution source on-line monitoring data falsification
CN112699113A (en) * 2021-01-12 2021-04-23 上海交通大学 Industrial manufacturing process operation monitoring system driven by time sequence data stream
CN113012388B (en) * 2021-02-19 2023-02-24 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113012388A (en) * 2021-02-19 2021-06-22 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113655189A (en) * 2021-03-31 2021-11-16 吴超烽 Automatic monitoring data analysis and judgment system for pollution source
CN113705547A (en) * 2021-10-28 2021-11-26 北京万维盈创科技发展有限公司 Dynamic management and control method and device for recognizing false behavior of environment blurring
CN113705547B (en) * 2021-10-28 2022-03-25 北京万维盈创科技发展有限公司 Dynamic management and control method and device for recognizing false behavior of environment blurring
CN117407661A (en) * 2023-12-14 2024-01-16 深圳前海慧联科技发展有限公司 Data enhancement method for equipment state detection
CN117407661B (en) * 2023-12-14 2024-02-27 深圳前海慧联科技发展有限公司 Data enhancement method for equipment state detection

Similar Documents

Publication Publication Date Title
CN110245880A (en) A kind of pollution sources on-line monitoring data cheating recognition methods
CN110381079B (en) Method for detecting network log abnormity by combining GRU and SVDD
CN107949812A (en) For detecting the abnormal combined method in water distribution system
WO2019019709A1 (en) Method for detecting water leakage of tap water pipe
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
CN112288021A (en) Medical wastewater monitoring data quality control method, device and system
Liang et al. A stock time series forecasting approach incorporating candlestick patterns and sequence similarity
CN109034140A (en) Industrial control network abnormal signal detection method based on deep learning structure
CN112633779B (en) Method for evaluating reliability of environmental monitoring data
CN106330949B (en) One kind being based on markovian intrusion detection method
CN114201374A (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
Qian et al. Deep learning based anomaly detection in water distribution systems
CN112906738A (en) Water quality detection and treatment method
CN115883163A (en) Network safety alarm monitoring method
CN114049134A (en) Pollution source online monitoring data counterfeiting identification method
CN111191855A (en) Water quality abnormal event identification and early warning method based on pipe network multi-element water quality time sequence data
Yang et al. Teacher-student uncertainty autoencoder for the process-relevant and quality-relevant fault detection in the industrial process
CN110390478A (en) Supervisory systems and monitoring and managing method after finance based on Internet of Things is borrowed
Yang et al. Extracting useful signals from flawed sensor data: Developing hybrid data-driven approaches with physical factors
Salazar et al. Monitoring approaches for security and safety analysis: application to a load position system
CN110705597B (en) Network early event detection method and system based on event cause and effect extraction
CN115062851A (en) Pollution discharge abnormity monitoring method and system based on multi-algorithm fusion
CN114898890A (en) Pneumonia epidemic situation control method and system based on deep learning
CN110598973A (en) IAP-based risk evaluation method for authentication process of green furniture product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190917