CN109740694A - A kind of smart grid inartful loss detection method based on unsupervised learning - Google Patents

A kind of smart grid inartful loss detection method based on unsupervised learning Download PDF

Info

Publication number
CN109740694A
CN109740694A CN201910066167.0A CN201910066167A CN109740694A CN 109740694 A CN109740694 A CN 109740694A CN 201910066167 A CN201910066167 A CN 201910066167A CN 109740694 A CN109740694 A CN 109740694A
Authority
CN
China
Prior art keywords
data
cluster
principal component
data set
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910066167.0A
Other languages
Chinese (zh)
Inventor
曲正伟
李弘文
王云静
田亚静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN201910066167.0A priority Critical patent/CN109740694A/en
Publication of CN109740694A publication Critical patent/CN109740694A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of smart grid inartful loss detection method based on unsupervised learning, is related to smart grid advanced measurement system field.Raw data set is carried out dimension specification, i.e. dimension-reduction treatment using principal component analytical method by the present invention;The data after dimension-reduction treatment are clustered based on k-means method, most of normal datas are subjected to beta pruning;Precise information processing is carried out in conjunction with local outlier factor detection algorithm (LOF), it is final to realize being precisely separating for abnormal data, achieve the purpose that detect non-technical type loss;The evaluation that accuracy in detection is carried out with ROC curve, the feasibility for verifying this method can accuracy;It determines method and simulation analysis is carried out to it using emulation tool.Detection method proposed by the present invention efficient, practicability simpler and more direct than existing technology and practicality are stronger, and can more effectively improve detection efficiency, can save a large amount of time and resource.

Description

A kind of smart grid inartful loss detection method based on unsupervised learning
Technical field
The present invention relates to smart grid advanced measurement system field more particularly to a kind of intelligence electricity based on unsupervised learning Net inartful loss detection method efficiently can quickly detect the inartful loss of smart grid.
Background technique
In recent years, the development of smart grid is filled with new vitality and hope for power industry, while also to traditional power grid Mode proposes new challenge.As the increase of global resources and environmental pressure, the propulsion of electricity marketization process, user are to electric energy The factors such as the promotion of quality and electricity consumption reliability requirement make power industry face unprecedented challenge, many countries and tissue It proposes to build the smart grid with performances such as flexible, cleaning, safety, economy, close friends, and smart grid is considered as and is not sent a telegram here The developing direction of net
The basis of smart grid is number between distributed data transport, calculating and control technology and multiple power supply units According to effective transmission technology with control command.On this basis, power grid needs more efficient communication, measures system.To understand Certainly this needs produce the advanced measurement system of smart grid (Advanced Metering Infrastructure, AMI) Concept, AMI play further important role in smart grid.It is in system operation, asset management, especially load responding In significant effect achieved, be increasingly becoming research and engineering construction project most popular in entire power industry.
But so complicated detection, communication system, the security threat faced also should not be underestimated.AMI system has several keys Feature to be easy it under attack:
(1) communication system is complicated, and section communication link bandwidth is limited;
(2) accessed it is a large amount of it is low calculate, it is low storage, low protective capacities equipment;
(3) user data of a large amount of sensitivities is stored.
Criminal often utilizes security protection weakness feature under AMI system, attacks smart grid, implements to steal The illegal electricity consumption behavior such as electricity and fraud, jeopardizes the safety of smart grid, with user's stealing that is matching net side and a series of takes advantage of as this The related energy loss of deceiving property electricity consumption behavior may be collectively referred to as inartful loss (Nontechnical Loss, NTL).This measure Not only electric energy is caused to be largely lost, upset normally for electricity consumption order, while also brought to the safe operation of power grid serious Hidden danger.According to incompletely statistics, China is every year because revenue losses caused by inartful loss accounts for the 0.5% of total income and arrives 3.5%.
Currently, the measure of opposing electricity-stealing that Guo Wang power supply company takes is most are as follows: apply specialized electric energy metering box and batch meter; Low-pressure line-outgoing end is closed to the conductor of metering device, this technology is the method being most widely used in current Prevention Stealing Electricity Technology; Intelligent electric energy meter of opposing electricity-stealing, abundant electric energy table function are installed;Improve the utility ratio etc. of electric acquisition system.But these methods are most To study based on device against charge evasion, lack enough algorithms of opposing electricity-stealing for analyzing the history electricity consumption data of magnanimity, to be difficult It was found that stealing user's uses electrical feature.
In conclusion AMI system makes smart grid improve the data sampling and processing ability of smart grid, confession is strengthened To contacting for side and Demand-side.But also increase power grid risk under attack.Therefore it needs to take effective measures next pair Inartful loss is effectively detected, and effective inartful loss detection method can be the power utility check of power supply company Work provides reference, improves the hit rate of site inspection, cuts operating costs, while can save a large amount of manpower and material resources;For Promote to build strong smart grid, the safety for improving power grid has very important research significance.
Summary of the invention
It is an object of that present invention to provide a kind of the smart grid inartful loss detection method based on unsupervised learning, purport Abnormal data is being obtained by carrying out clustering to the electricity consumption initial data of characterization electricity consumption behavior, to judge that electricity consumption behavior is different Often, to reach the loss detection of smart grid inartful, this method has easy, efficient, Consideration comprehensively and practicability High feature.
To achieve the above object, the present invention is achieved by the following technical solutions: a kind of intelligence based on unsupervised learning Energy power grid inartful loss detection method, characterized by the following steps:
Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas;Choose the electricity consumption of a variety of characterization electricity consumption behaviors Raw data set is carried out dimension specification as original index data set, using principal component analytical method by initial data;
Step (2), which uses, is based on k-means clustering method, the data set that step (1) is obtained using Principal Component Analysis It is clustered, and rejects normal data, obtain abnormal data;
Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), real Existing abnormal data is precisely separating, and completes the loss detection of smart grid inartful.
A further technical solution lies in: in step 1, the original index data set includes trend indicator, mobility Index, fluctuation index, rear r monthly average load and the ratio indicator of all monthly average loads and the load sequence of each user The related coefficient index of column and all customer charge median sequences.
A further technical solution lies in: steps are as follows for the trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of point for including Number is respectively a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following Index calculates:
4) ascendant trend index tra and downward trend index trb is calculated
A further technical solution lies in: the mobility index refers to the first difference measurement of user power utilization mode;Packet It includes:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
A further technical solution lies in: the fluctuation index are as follows:
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
A further technical solution lies in: in step 1, original index data set is carried out using principal component analytical method Dimension specification, detailed process is as follows:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows:
x1,x2,…,xp (6)
Calculate covariance matrix:
∑=(Sij)p×p (7)
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1It is corresponding Unit character vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m is by covariance information contribution rate of accumulative total G (m) it determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly to extract Preceding m principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ..., P) in all principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
A further technical solution lies in: in step 2, k-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d, xj,dRespectively indicate data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate kth class cluster Update class cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and is rejected normal Data, specific cluster process are as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise Return step (3) continues operation.
A further technical solution lies in: in step 3, the accurate description of local outlier factor detection algorithm establish with Under on several formula basis:
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean of q two o'clock Distance;K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q; lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk (p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class Distance if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set;
In addition, to improve the detection accuracy of algorithm, in the deterministic process for carrying out outlier, it is necessary to meet two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth in the i-th dvielement after k mean cluster to the data set after the processing of PCA method ?;niFor the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D;If ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than this Class cluster radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, judgement is positive Regular data is rejected;
(6) it is calculated and the factor that peels off for all data points that sort, is peeled off with local outlier factor detection algorithm The selection result of the factor is to realize the detection of inartful loss.
Detection method proposed by the present invention is more economically convenient compared with prior art, practicability is high, and passes through two kinds of calculations The integration of method, the detection accuracy for effectively avoiding k-means method are highly dependent on the selection of clustering parameter, and outlier It is cluster process " by-product ", causes its detection accuracy comparatively not counting high;With outlier detection algorithm by comparing institute There is the LOF value of data point to judge the degree of peeling off, this generates a large amount of calculating unnecessary, cause time cost too high, simultaneously Due to intermediate result storage and the shortcomings that wasting space resource.And it is proposed by the present invention by raw data set with principal component point Analysis method carries out dimension-reduction treatment, improves the integral operation speed of algorithm;It is proposed is commented using ROC curve method progress detection accuracy Estimate, can intuitive detection method accuracy.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
The flow chart of principal component analytical method in Fig. 1 the method for the present invention;
The flow chart of k-means detection algorithm in Fig. 2 the method for the present invention;
Outlier detection method (LOF) flow chart in Fig. 3 the method for the present invention;
The Technology Roadmap of Fig. 4 the method for the present invention;
The overview flow chart of Fig. 5 the method for the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, it is described in detail below with reference to Fig. 1-5.
As illustrated in figures 4-5, a kind of smart grid inartful loss detection based on unsupervised learning that the present invention illustrates Method, the specific steps are as follows:
(1) a variety of electricity consumption datas can trigger based on an electricity consumption behavior;The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data;
Mark data set is refered in particular in (1-1) electricity consumption
Cause abnormal power information be frequently not it is isolated, a behavior may trigger a variety of exceptions.If only with single Index is then to be likely occurred omission or erroneous judgement according to being detected.Therefore, efficient anti-inartful loss work should lead to The characteristic quantity crossing and carry out comprehensive characteristics extraction to a variety of abnormal datas, and can quantify caused by the various means is unfolded.
Comprising a month electricity consumption data of N number of power consumer H in the quasi- data set extracted, user's is flat with its moon of power mode Equal load indicates that then the load sequence of each user can be expressed as H dimensional vector,Institute is useful Family can be expressed as data set X={ xn, n=1,2 ..., N }.
The characteristic quantity of user power utilization mode can be further extracted on the basis of data set X.
1. trend indicator
The calculating of trend indicator is established on the basis of sequence moving average.The method of moving average is a kind of analysis time sequence The common tool of column can be divided into simple rolling average, weighted moving average and index rolling average etc..Simple rolling average is The arithmetic mean of instantaneous value of the preceding n numerical value of certain variable.If time series is expressed as { A1A2,…,An, then the n point of t moment is mobile flat Mean value is Ft={ At-1+At-2+…+At-n}/n。
Steps are as follows for trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of point for including Number is respectively a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following Index calculates:
4) ascendant trend index tra and downward trend index trb is calculated
2. mobility index
Mobility index refers to the first difference measurement of user power utilization mode.Include:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
3. fluctuation index
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
4. other indexs
1) ratio of r monthly average load and all monthly average loads afterwards;
2) related coefficient of the load sequence of each user and all customer charge median sequences.
(1-2) is based on the raw data set dimension specification of principal component analytical method (PCA)
The feature quantity of extraction is more and different characteristic may include overlay information, in order to intuitively show in low-dimensional plane Each user's with power mode and efficiently excavates abnormal user, it is necessary to carry out dimension reduction, i.e. dimension-reduction treatment to data set.Institute Meaning dimension reduction is exactly to convert to data set, indicates original data set information as much as possible with the new attribute of negligible amounts. Principal component analysis (principal component analysis, PCA) is a kind of representative dimension reduction method, specific Realization process is as follows:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows: x1,x2,…,xp (6)
Calculate covariance matrix:
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1It is corresponding Unit character vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m is by covariance information contribution rate of accumulative total G (m) it determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly to extract Preceding m principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ..., P) in all principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
Its specific flow chart is as shown in Figure 1.
(2) cluster data based on k-means clustering method
K-means algorithm is a kind of indirect clustering method based on similarity measurement between sample, belongs to unsupervised learning side Clustering algorithm of one of the method based on division, using distance as the standard of similarity measurement between data object, i.e. data object Between distance it is smaller, then their similitude is higher, then they are more possible in same class cluster.
K-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d, xj,dRespectively indicate data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate kth class cluster Update class cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and is rejected normal Data, specific cluster process are as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise Return step (3) continues operation.
Its specific flow chart is as shown in Figure 2.
(3) it is combined based on local outlier factor detection algorithm (LOF) with k-means method and carries out precise information processing;
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean of q two o'clock Distance;K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q; lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk (p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Its specific flow chart is as shown in Figure 3.
This method calls k means clustering algorithm to extract Candidate Set, wherein judgment rule are as follows: in the object and class in every class The distance of the heart is if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set.
For the detection accuracy for improving algorithm, method proposed in this paper is in the deterministic process for carrying out outlier, it is necessary to meet Two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth in the i-th dvielement after k mean cluster to the data set after the processing of PCA method ?;niFor the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D;If ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than this Class cluster radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, judgement is positive Regular data is rejected;
(6) it is calculated and the factor that peels off for all data points that sort, is peeled off with local outlier factor detection algorithm The selection result of the factor is to realize the detection of inartful loss.
Its specific flow chart is as shown in Figure 5.
(4) evaluation of accuracy in detection
Abnormal electricity consumption mode detection is inherently binary classification problems, i.e., all users is divided into two classes: just common Family and abnormal user.Confusion matrix is a basic tool for assessing classifier confidence level.For binary classification problems, attached drawing 4 Shown in confusion matrix show all possible classification results of classifier, wherein row (positive/negative) correspond to Classification belonging to object reality, the classification of column (true/false) presentation class device prediction.
Wherein FP is Error type I, and FN is error type II.Multiple points can be derived on the basis of confusion matrix The evaluation index of class device:
Precision ratio PRE=TP/ (TP+FP) is indicated the probability of positive example point pair;
Rate of failing to report FNR=FN/ (FN+TP) indicates the probability that positive example mistake is divided into negative example;
True positive rate TPR=TP/ (TP+FN) indicates to be correctly judged in the sample that all reality are positive as sun The ratio of property;
Pseudo- positive rate FPR=FP/ (FP+TN) indicates to be wrongly judged in the sample that all reality are negative as sun The ratio of property.
The above index measures classification results from different aspect, and there are probelem in two aspects for these indexs.Firstly, working as data set In positive and negative sample proportion imbalance when these indexs there is a problem of it is serious.It is the extreme case of 99:1 with positive and negative sample proportion For, in this case, some classifier only needs all to determine to be positive by all samples, then the accuracy rate classified is with regard to reachable 99%, but evaluation index at this time and do not have reference significance.Secondly, these belong to Static State Index, and some classifiers Exporting result is not simple 0 or 1, but provides the degree that object belongs to some classification, these classifiers take different thresholds Value can be obtained by the whole confidence level that different classification results need to be measured classifier with dynamic index.
ROC (receiver operating characteristic) curve describes in confusion matrix FPR and TPR two The relativeness of index rate of rise.For the serial number of binary classification model output, the sample that will be greater than threshold value is divided into just Class then divides negative class into less than the sample of threshold value.Reducing threshold values no doubt can recognize that more positive classes, that is, improve TPR, but simultaneously Also more negative samples can be divided into positive class, that is, improves FPR.This change procedure can be visualized by introducing ROC curve.ROC The confusion matrix that each puts classification results when corresponding classifier takes some threshold value on curve.
In ROC space coordinate, point (0,1) indicates that ideal sort device, ROC curve are imitated closer to point (0,1) presentation class Fruit is better.Area under the curve (area under curve, AUC) is with a numerical value come the quality of presentation class device, the numerical value of AUC It is exactly the size of ROC curve section below area, biggish AUC represents preferable performance, AUC=1 corresponding ideal point Class device.
(5) simulation analysis is carried out to example with matlab software;
(5-1) determines example and its essential feature;
The initial data set that the present invention uses is adopted for 3000 power consumer 6 months power load data of certain substation It is divided between sample 15 minutes.Power load can be mutually converted with two kinds of indexs of electricity consumption, and the two is in reflection user power utilization rule side Face is substantially consistent, can also be using electricity consumption as the characteristic index for describing user power utilization mode.It utilizes MATLAB7.10 is emulated.3000 power consumers include 2965 normal users and 35 abnormal users, abnormal user ratio Example is 1.67%
(5-2) carries out simulation analysis to example using matlab software programming function
Abnormal user, i.e. inartful loss source can be quickly detected by the model known to emulation, is meeting accuracy With on the basis of economy realize maximum likelihood detect inartful lose.
Above-described implementation example is only that preferred embodiments of the present invention will be described, not to of the invention Range is defined, and without departing from the spirit of the design of the present invention, those of ordinary skill in the art are to technology of the invention The various changes and improvements that scheme is made should all be fallen into the protection scope that claims of the present invention determines.

Claims (8)

1. a kind of smart grid inartful loss detection method based on unsupervised learning, it is characterised in that: including walking as follows It is rapid:
Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas;The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data;
Step (2) is carried out using k-means clustering method is based on, by step (1) using the data set that Principal Component Analysis obtains Cluster, and normal data is rejected, obtain abnormal data;
Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), realizes different Regular data is precisely separating, and completes the loss detection of smart grid inartful.
2. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, Be characterized in that: in step 1, the original index data set includes trend indicator, mobility index, fluctuation index, the rear r month The load sequence and all customer charge intermediate value sequences of the ratio indicator and each user of average load and all monthly average loads The related coefficient index of column.
3. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, Be characterized in that: steps are as follows for the trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of points for including point It Wei not a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following indexs It calculates:
4) ascendant trend index tra and downward trend index trb is calculated
4. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, Be characterized in that: the mobility index refers to the first difference measurement of user power utilization mode;Include:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
5. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, It is characterized in that: the fluctuation index are as follows:
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
6. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, It is characterized in that: in step 1, original index data set being subjected to dimension specification using principal component analytical method, detailed process is such as Under:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows:
x1,x2,…,xp (6)
Calculate covariance matrix:
∑=(Sij)p×p (7)
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1Corresponding unit Feature vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m be by covariance information contribution rate of accumulative total G (m) come It determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly the preceding m extracted A principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ..., p) All principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
7. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, It is characterized in that: in step 2, k-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d,xj,dPoint It Biao Shi not data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate the update class of kth class cluster Cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and rejects normal number According to specific cluster process is as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise it returns Step (3) continues operation.
8. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, Be characterized in that: in step 3, the accurate description of local outlier factor detection algorithm is established on following formula basis:
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean distance of q two o'clock; K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q; lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk (p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class away from From if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set;
In addition, to improve the detection accuracy of algorithm, in the deterministic process for carrying out outlier, it is necessary to meet two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth item in the i-th dvielement after k mean cluster to the data set after the processing of PCA method;ni For the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D;Such as Fruit ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than such cluster Radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, being judged as normal number According to rejecting;
(6) it is calculated with local outlier factor detection algorithm and the factor that peels off for all data points that sort, peel off the factor The selection result be realize inartful loss detection.
CN201910066167.0A 2019-01-24 2019-01-24 A kind of smart grid inartful loss detection method based on unsupervised learning Pending CN109740694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910066167.0A CN109740694A (en) 2019-01-24 2019-01-24 A kind of smart grid inartful loss detection method based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910066167.0A CN109740694A (en) 2019-01-24 2019-01-24 A kind of smart grid inartful loss detection method based on unsupervised learning

Publications (1)

Publication Number Publication Date
CN109740694A true CN109740694A (en) 2019-05-10

Family

ID=66365880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910066167.0A Pending CN109740694A (en) 2019-01-24 2019-01-24 A kind of smart grid inartful loss detection method based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN109740694A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264272A (en) * 2019-06-21 2019-09-20 山东师范大学 A kind of mobile Internet labor service crowdsourcing platform task optimal pricing prediction technique, apparatus and system
CN110288383A (en) * 2019-05-31 2019-09-27 国网上海市电力公司 Group behavior power distribution network multiplexing electric abnormality detection method based on user property label
CN110298552A (en) * 2019-05-31 2019-10-01 国网上海市电力公司 A kind of power distribution network individual power method for detecting abnormality of combination history electrical feature
CN110309884A (en) * 2019-07-05 2019-10-08 国网四川省电力公司经济技术研究院 Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system
CN110852384A (en) * 2019-11-12 2020-02-28 武汉联影医疗科技有限公司 Medical image quality detection method, device and storage medium
CN111125470A (en) * 2019-12-25 2020-05-08 成都康赛信息技术有限公司 Method for improving abnormal data mining and screening
CN111175626A (en) * 2020-03-20 2020-05-19 广东电网有限责任公司 Abnormal detection method for insulation state of switch cabinet
CN112000655A (en) * 2020-08-26 2020-11-27 广东电网有限责任公司广州供电局 Transformer load data preprocessing method, device and equipment
CN112101765A (en) * 2020-09-08 2020-12-18 国网山东省电力公司菏泽供电公司 Abnormal data processing method and system for operation index data of power distribution network
CN112230056A (en) * 2020-09-07 2021-01-15 国网河南省电力公司电力科学研究院 Multi-harmonic source contribution calculation method based on OFMMK-Means clustering and composite quantile regression
CN112380992A (en) * 2020-11-13 2021-02-19 上海交通大学 Method and device for evaluating and optimizing accuracy of monitoring data in machining process
CN112464289A (en) * 2020-12-11 2021-03-09 广东工业大学 Method for cleaning private data
CN112966567A (en) * 2021-02-05 2021-06-15 深圳市品致信息科技有限公司 Coordinate positioning method, system, storage medium and terminal based on PCA (principal component analysis), clustering and K nearest neighbor
CN113723497A (en) * 2021-08-26 2021-11-30 广西大学 Abnormal electricity utilization detection method, device, equipment and storage medium based on mixed feature extraction and Stacking model
CN115511106A (en) * 2022-11-15 2022-12-23 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data
CN116910593A (en) * 2023-09-14 2023-10-20 北京豪迈生物工程股份有限公司 Signal noise suppression method and system for chemiluminescent instrument
CN117808497A (en) * 2024-03-01 2024-04-02 清华四川能源互联网研究院 Electric power carbon emission abnormity detection module and method based on distance and direction characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106707233A (en) * 2017-03-03 2017-05-24 广东工业大学 Multi-side positioning method and multi-side positioning device based on outlier detection
CN108593990A (en) * 2018-06-04 2018-09-28 国网天津市电力公司 A kind of stealing detection method and application based on electric power users electricity consumption behavior pattern
CN109146705A (en) * 2018-07-02 2019-01-04 昆明理工大学 A kind of method of electricity consumption characteristic index dimensionality reduction and the progress stealing detection of extreme learning machine algorithm
CN109255726A (en) * 2018-09-07 2019-01-22 中国电建集团华东勘测设计研究院有限公司 A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106707233A (en) * 2017-03-03 2017-05-24 广东工业大学 Multi-side positioning method and multi-side positioning device based on outlier detection
CN108593990A (en) * 2018-06-04 2018-09-28 国网天津市电力公司 A kind of stealing detection method and application based on electric power users electricity consumption behavior pattern
CN109146705A (en) * 2018-07-02 2019-01-04 昆明理工大学 A kind of method of electricity consumption characteristic index dimensionality reduction and the progress stealing detection of extreme learning machine algorithm
CN109255726A (en) * 2018-09-07 2019-01-22 中国电建集团华东勘测设计研究院有限公司 A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘广聪: ""一种基于离群点检测的定位算法"", 《计算机应用于软件》 *
孙毅等: ""基于高斯核函数改进的电力用户用电数据离群点检测方法"", 《电网技术》 *
庄池杰等: ""基于无监督学习的电力用户异常用电模式检测"", 《中国电机工程学报》 *
陶晶: ""基于聚类和密度的离群点检测方法"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288383A (en) * 2019-05-31 2019-09-27 国网上海市电力公司 Group behavior power distribution network multiplexing electric abnormality detection method based on user property label
CN110298552A (en) * 2019-05-31 2019-10-01 国网上海市电力公司 A kind of power distribution network individual power method for detecting abnormality of combination history electrical feature
CN110288383B (en) * 2019-05-31 2024-02-02 国网上海市电力公司 Group behavior power distribution network electricity utilization abnormality detection method based on user attribute tags
CN110298552B (en) * 2019-05-31 2023-12-01 国网上海市电力公司 Power distribution network individual power abnormality detection method combining historical electricity utilization characteristics
CN110264272A (en) * 2019-06-21 2019-09-20 山东师范大学 A kind of mobile Internet labor service crowdsourcing platform task optimal pricing prediction technique, apparatus and system
CN110309884A (en) * 2019-07-05 2019-10-08 国网四川省电力公司经济技术研究院 Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system
CN110852384B (en) * 2019-11-12 2023-06-27 武汉联影医疗科技有限公司 Medical image quality detection method, device and storage medium
CN110852384A (en) * 2019-11-12 2020-02-28 武汉联影医疗科技有限公司 Medical image quality detection method, device and storage medium
CN111125470A (en) * 2019-12-25 2020-05-08 成都康赛信息技术有限公司 Method for improving abnormal data mining and screening
CN111175626A (en) * 2020-03-20 2020-05-19 广东电网有限责任公司 Abnormal detection method for insulation state of switch cabinet
CN112000655A (en) * 2020-08-26 2020-11-27 广东电网有限责任公司广州供电局 Transformer load data preprocessing method, device and equipment
CN112230056B (en) * 2020-09-07 2022-04-26 国网河南省电力公司电力科学研究院 Multi-harmonic-source contribution calculation method based on OFMMK-Means clustering and composite quantile regression
CN112230056A (en) * 2020-09-07 2021-01-15 国网河南省电力公司电力科学研究院 Multi-harmonic source contribution calculation method based on OFMMK-Means clustering and composite quantile regression
CN112101765A (en) * 2020-09-08 2020-12-18 国网山东省电力公司菏泽供电公司 Abnormal data processing method and system for operation index data of power distribution network
CN112380992B (en) * 2020-11-13 2022-12-20 上海交通大学 Method and device for evaluating and optimizing accuracy of monitoring data in machining process
CN112380992A (en) * 2020-11-13 2021-02-19 上海交通大学 Method and device for evaluating and optimizing accuracy of monitoring data in machining process
CN112464289A (en) * 2020-12-11 2021-03-09 广东工业大学 Method for cleaning private data
CN112966567A (en) * 2021-02-05 2021-06-15 深圳市品致信息科技有限公司 Coordinate positioning method, system, storage medium and terminal based on PCA (principal component analysis), clustering and K nearest neighbor
CN113723497A (en) * 2021-08-26 2021-11-30 广西大学 Abnormal electricity utilization detection method, device, equipment and storage medium based on mixed feature extraction and Stacking model
CN115511106A (en) * 2022-11-15 2022-12-23 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data
CN115511106B (en) * 2022-11-15 2023-04-07 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data
CN116910593B (en) * 2023-09-14 2023-11-17 北京豪迈生物工程股份有限公司 Signal noise suppression method and system for chemiluminescent instrument
CN116910593A (en) * 2023-09-14 2023-10-20 北京豪迈生物工程股份有限公司 Signal noise suppression method and system for chemiluminescent instrument
CN117808497A (en) * 2024-03-01 2024-04-02 清华四川能源互联网研究院 Electric power carbon emission abnormity detection module and method based on distance and direction characteristics
CN117808497B (en) * 2024-03-01 2024-05-14 清华四川能源互联网研究院 Electric power carbon emission abnormity detection module and method based on distance and direction characteristics

Similar Documents

Publication Publication Date Title
CN109740694A (en) A kind of smart grid inartful loss detection method based on unsupervised learning
Wang et al. Detection of power grid disturbances and cyber-attacks based on machine learning
CN104809658B (en) A kind of rapid analysis method of low-voltage distribution network taiwan area line loss
CN108133225A (en) A kind of icing flashover fault early warning method based on support vector machines
CN102955902B (en) Method and system for evaluating reliability of radar simulation equipment
CN106154163B (en) Battery life state identification method
CN106485089B (en) The interval parameter acquisition methods of harmonic wave user&#39;s typical condition
CN109446812A (en) A kind of embedded system firmware safety analytical method and system
CN109039503A (en) A kind of frequency spectrum sensing method, device, equipment and computer readable storage medium
CN110826618A (en) Personal credit risk assessment method based on random forest
CN109787979A (en) A kind of detection method of electric power networks event and invasion
CN112735097A (en) Regional landslide early warning method and system
CN110569876A (en) Non-invasive load identification method and device and computing equipment
CN108805193A (en) A kind of power loss data filling method based on mixed strategy
CN111242161A (en) Non-invasive non-resident user load identification method based on intelligent learning
CN111562541B (en) Software platform for realizing electric energy meter detection data management by applying CART algorithm
CN115081933B (en) Low-voltage user topology construction method and system based on improved spectral clustering
CN112463848A (en) Method, system, device and storage medium for detecting abnormal user behavior
Cao et al. Density-based fuzzy C-means multi-center re-clustering radar signal sorting algorithm
Frank et al. Extracting operating modes from building electrical load data
Zhou et al. Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm
CN114240041A (en) Lean line loss analysis method and system for distribution network distribution area
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
Li et al. Hierarchical clustering driven by cognitive features
Wang et al. Power quality disturbance recognition method in park distribution network based on one-dimensional vggnet and multi-label classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190510