CN109740694A - A kind of smart grid inartful loss detection method based on unsupervised learning - Google Patents
A kind of smart grid inartful loss detection method based on unsupervised learning Download PDFInfo
- Publication number
- CN109740694A CN109740694A CN201910066167.0A CN201910066167A CN109740694A CN 109740694 A CN109740694 A CN 109740694A CN 201910066167 A CN201910066167 A CN 201910066167A CN 109740694 A CN109740694 A CN 109740694A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- principal component
- data set
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of smart grid inartful loss detection method based on unsupervised learning, is related to smart grid advanced measurement system field.Raw data set is carried out dimension specification, i.e. dimension-reduction treatment using principal component analytical method by the present invention;The data after dimension-reduction treatment are clustered based on k-means method, most of normal datas are subjected to beta pruning;Precise information processing is carried out in conjunction with local outlier factor detection algorithm (LOF), it is final to realize being precisely separating for abnormal data, achieve the purpose that detect non-technical type loss;The evaluation that accuracy in detection is carried out with ROC curve, the feasibility for verifying this method can accuracy;It determines method and simulation analysis is carried out to it using emulation tool.Detection method proposed by the present invention efficient, practicability simpler and more direct than existing technology and practicality are stronger, and can more effectively improve detection efficiency, can save a large amount of time and resource.
Description
Technical field
The present invention relates to smart grid advanced measurement system field more particularly to a kind of intelligence electricity based on unsupervised learning
Net inartful loss detection method efficiently can quickly detect the inartful loss of smart grid.
Background technique
In recent years, the development of smart grid is filled with new vitality and hope for power industry, while also to traditional power grid
Mode proposes new challenge.As the increase of global resources and environmental pressure, the propulsion of electricity marketization process, user are to electric energy
The factors such as the promotion of quality and electricity consumption reliability requirement make power industry face unprecedented challenge, many countries and tissue
It proposes to build the smart grid with performances such as flexible, cleaning, safety, economy, close friends, and smart grid is considered as and is not sent a telegram here
The developing direction of net
The basis of smart grid is number between distributed data transport, calculating and control technology and multiple power supply units
According to effective transmission technology with control command.On this basis, power grid needs more efficient communication, measures system.To understand
Certainly this needs produce the advanced measurement system of smart grid (Advanced Metering Infrastructure, AMI)
Concept, AMI play further important role in smart grid.It is in system operation, asset management, especially load responding
In significant effect achieved, be increasingly becoming research and engineering construction project most popular in entire power industry.
But so complicated detection, communication system, the security threat faced also should not be underestimated.AMI system has several keys
Feature to be easy it under attack:
(1) communication system is complicated, and section communication link bandwidth is limited;
(2) accessed it is a large amount of it is low calculate, it is low storage, low protective capacities equipment;
(3) user data of a large amount of sensitivities is stored.
Criminal often utilizes security protection weakness feature under AMI system, attacks smart grid, implements to steal
The illegal electricity consumption behavior such as electricity and fraud, jeopardizes the safety of smart grid, with user's stealing that is matching net side and a series of takes advantage of as this
The related energy loss of deceiving property electricity consumption behavior may be collectively referred to as inartful loss (Nontechnical Loss, NTL).This measure
Not only electric energy is caused to be largely lost, upset normally for electricity consumption order, while also brought to the safe operation of power grid serious
Hidden danger.According to incompletely statistics, China is every year because revenue losses caused by inartful loss accounts for the 0.5% of total income and arrives
3.5%.
Currently, the measure of opposing electricity-stealing that Guo Wang power supply company takes is most are as follows: apply specialized electric energy metering box and batch meter;
Low-pressure line-outgoing end is closed to the conductor of metering device, this technology is the method being most widely used in current Prevention Stealing Electricity Technology;
Intelligent electric energy meter of opposing electricity-stealing, abundant electric energy table function are installed;Improve the utility ratio etc. of electric acquisition system.But these methods are most
To study based on device against charge evasion, lack enough algorithms of opposing electricity-stealing for analyzing the history electricity consumption data of magnanimity, to be difficult
It was found that stealing user's uses electrical feature.
In conclusion AMI system makes smart grid improve the data sampling and processing ability of smart grid, confession is strengthened
To contacting for side and Demand-side.But also increase power grid risk under attack.Therefore it needs to take effective measures next pair
Inartful loss is effectively detected, and effective inartful loss detection method can be the power utility check of power supply company
Work provides reference, improves the hit rate of site inspection, cuts operating costs, while can save a large amount of manpower and material resources;For
Promote to build strong smart grid, the safety for improving power grid has very important research significance.
Summary of the invention
It is an object of that present invention to provide a kind of the smart grid inartful loss detection method based on unsupervised learning, purport
Abnormal data is being obtained by carrying out clustering to the electricity consumption initial data of characterization electricity consumption behavior, to judge that electricity consumption behavior is different
Often, to reach the loss detection of smart grid inartful, this method has easy, efficient, Consideration comprehensively and practicability
High feature.
To achieve the above object, the present invention is achieved by the following technical solutions: a kind of intelligence based on unsupervised learning
Energy power grid inartful loss detection method, characterized by the following steps:
Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas;Choose the electricity consumption of a variety of characterization electricity consumption behaviors
Raw data set is carried out dimension specification as original index data set, using principal component analytical method by initial data;
Step (2), which uses, is based on k-means clustering method, the data set that step (1) is obtained using Principal Component Analysis
It is clustered, and rejects normal data, obtain abnormal data;
Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), real
Existing abnormal data is precisely separating, and completes the loss detection of smart grid inartful.
A further technical solution lies in: in step 1, the original index data set includes trend indicator, mobility
Index, fluctuation index, rear r monthly average load and the ratio indicator of all monthly average loads and the load sequence of each user
The related coefficient index of column and all customer charge median sequences.
A further technical solution lies in: steps are as follows for the trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of point for including
Number is respectively a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following
Index calculates:
4) ascendant trend index tra and downward trend index trb is calculated
A further technical solution lies in: the mobility index refers to the first difference measurement of user power utilization mode;Packet
It includes:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
A further technical solution lies in: the fluctuation index are as follows:
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
A further technical solution lies in: in step 1, original index data set is carried out using principal component analytical method
Dimension specification, detailed process is as follows:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows:
x1,x2,…,xp (6)
Calculate covariance matrix:
∑=(Sij)p×p (7)
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1It is corresponding
Unit character vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m is by covariance information contribution rate of accumulative total G
(m) it determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly to extract
Preceding m principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ...,
P) in all principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
A further technical solution lies in: in step 2, k-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d,
xj,dRespectively indicate data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate kth class cluster
Update class cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and is rejected normal
Data, specific cluster process are as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise
Return step (3) continues operation.
A further technical solution lies in: in step 3, the accurate description of local outlier factor detection algorithm establish with
Under on several formula basis:
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean of q two o'clock
Distance;K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q;
lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk
(p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class
Distance if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set;
In addition, to improve the detection accuracy of algorithm, in the deterministic process for carrying out outlier, it is necessary to meet two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth in the i-th dvielement after k mean cluster to the data set after the processing of PCA method
?;niFor the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi;
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as
D;If ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than this
Class cluster radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, judgement is positive
Regular data is rejected;
(6) it is calculated and the factor that peels off for all data points that sort, is peeled off with local outlier factor detection algorithm
The selection result of the factor is to realize the detection of inartful loss.
Detection method proposed by the present invention is more economically convenient compared with prior art, practicability is high, and passes through two kinds of calculations
The integration of method, the detection accuracy for effectively avoiding k-means method are highly dependent on the selection of clustering parameter, and outlier
It is cluster process " by-product ", causes its detection accuracy comparatively not counting high;With outlier detection algorithm by comparing institute
There is the LOF value of data point to judge the degree of peeling off, this generates a large amount of calculating unnecessary, cause time cost too high, simultaneously
Due to intermediate result storage and the shortcomings that wasting space resource.And it is proposed by the present invention by raw data set with principal component point
Analysis method carries out dimension-reduction treatment, improves the integral operation speed of algorithm;It is proposed is commented using ROC curve method progress detection accuracy
Estimate, can intuitive detection method accuracy.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
The flow chart of principal component analytical method in Fig. 1 the method for the present invention;
The flow chart of k-means detection algorithm in Fig. 2 the method for the present invention;
Outlier detection method (LOF) flow chart in Fig. 3 the method for the present invention;
The Technology Roadmap of Fig. 4 the method for the present invention;
The overview flow chart of Fig. 5 the method for the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, it is described in detail below with reference to Fig. 1-5.
As illustrated in figures 4-5, a kind of smart grid inartful loss detection based on unsupervised learning that the present invention illustrates
Method, the specific steps are as follows:
(1) a variety of electricity consumption datas can trigger based on an electricity consumption behavior;The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original
Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data;
Mark data set is refered in particular in (1-1) electricity consumption
Cause abnormal power information be frequently not it is isolated, a behavior may trigger a variety of exceptions.If only with single
Index is then to be likely occurred omission or erroneous judgement according to being detected.Therefore, efficient anti-inartful loss work should lead to
The characteristic quantity crossing and carry out comprehensive characteristics extraction to a variety of abnormal datas, and can quantify caused by the various means is unfolded.
Comprising a month electricity consumption data of N number of power consumer H in the quasi- data set extracted, user's is flat with its moon of power mode
Equal load indicates that then the load sequence of each user can be expressed as H dimensional vector,Institute is useful
Family can be expressed as data set X={ xn, n=1,2 ..., N }.
The characteristic quantity of user power utilization mode can be further extracted on the basis of data set X.
1. trend indicator
The calculating of trend indicator is established on the basis of sequence moving average.The method of moving average is a kind of analysis time sequence
The common tool of column can be divided into simple rolling average, weighted moving average and index rolling average etc..Simple rolling average is
The arithmetic mean of instantaneous value of the preceding n numerical value of certain variable.If time series is expressed as { A1A2,…,An, then the n point of t moment is mobile flat
Mean value is Ft={ At-1+At-2+…+At-n}/n。
Steps are as follows for trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of point for including
Number is respectively a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following
Index calculates:
4) ascendant trend index tra and downward trend index trb is calculated
2. mobility index
Mobility index refers to the first difference measurement of user power utilization mode.Include:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
3. fluctuation index
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
4. other indexs
1) ratio of r monthly average load and all monthly average loads afterwards;
2) related coefficient of the load sequence of each user and all customer charge median sequences.
(1-2) is based on the raw data set dimension specification of principal component analytical method (PCA)
The feature quantity of extraction is more and different characteristic may include overlay information, in order to intuitively show in low-dimensional plane
Each user's with power mode and efficiently excavates abnormal user, it is necessary to carry out dimension reduction, i.e. dimension-reduction treatment to data set.Institute
Meaning dimension reduction is exactly to convert to data set, indicates original data set information as much as possible with the new attribute of negligible amounts.
Principal component analysis (principal component analysis, PCA) is a kind of representative dimension reduction method, specific
Realization process is as follows:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows: x1,x2,…,xp (6)
Calculate covariance matrix:
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1It is corresponding
Unit character vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m is by covariance information contribution rate of accumulative total G
(m) it determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly to extract
Preceding m principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ...,
P) in all principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
Its specific flow chart is as shown in Figure 1.
(2) cluster data based on k-means clustering method
K-means algorithm is a kind of indirect clustering method based on similarity measurement between sample, belongs to unsupervised learning side
Clustering algorithm of one of the method based on division, using distance as the standard of similarity measurement between data object, i.e. data object
Between distance it is smaller, then their similitude is higher, then they are more possible in same class cluster.
K-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d,
xj,dRespectively indicate data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate kth class cluster
Update class cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and is rejected normal
Data, specific cluster process are as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise
Return step (3) continues operation.
Its specific flow chart is as shown in Figure 2.
(3) it is combined based on local outlier factor detection algorithm (LOF) with k-means method and carries out precise information processing;
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean of q two o'clock
Distance;K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q;
lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk
(p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Its specific flow chart is as shown in Figure 3.
This method calls k means clustering algorithm to extract Candidate Set, wherein judgment rule are as follows: in the object and class in every class
The distance of the heart is if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set.
For the detection accuracy for improving algorithm, method proposed in this paper is in the deterministic process for carrying out outlier, it is necessary to meet
Two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth in the i-th dvielement after k mean cluster to the data set after the processing of PCA method
?;niFor the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi;
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as
D;If ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than this
Class cluster radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, judgement is positive
Regular data is rejected;
(6) it is calculated and the factor that peels off for all data points that sort, is peeled off with local outlier factor detection algorithm
The selection result of the factor is to realize the detection of inartful loss.
Its specific flow chart is as shown in Figure 5.
(4) evaluation of accuracy in detection
Abnormal electricity consumption mode detection is inherently binary classification problems, i.e., all users is divided into two classes: just common
Family and abnormal user.Confusion matrix is a basic tool for assessing classifier confidence level.For binary classification problems, attached drawing 4
Shown in confusion matrix show all possible classification results of classifier, wherein row (positive/negative) correspond to
Classification belonging to object reality, the classification of column (true/false) presentation class device prediction.
Wherein FP is Error type I, and FN is error type II.Multiple points can be derived on the basis of confusion matrix
The evaluation index of class device:
Precision ratio PRE=TP/ (TP+FP) is indicated the probability of positive example point pair;
Rate of failing to report FNR=FN/ (FN+TP) indicates the probability that positive example mistake is divided into negative example;
True positive rate TPR=TP/ (TP+FN) indicates to be correctly judged in the sample that all reality are positive as sun
The ratio of property;
Pseudo- positive rate FPR=FP/ (FP+TN) indicates to be wrongly judged in the sample that all reality are negative as sun
The ratio of property.
The above index measures classification results from different aspect, and there are probelem in two aspects for these indexs.Firstly, working as data set
In positive and negative sample proportion imbalance when these indexs there is a problem of it is serious.It is the extreme case of 99:1 with positive and negative sample proportion
For, in this case, some classifier only needs all to determine to be positive by all samples, then the accuracy rate classified is with regard to reachable
99%, but evaluation index at this time and do not have reference significance.Secondly, these belong to Static State Index, and some classifiers
Exporting result is not simple 0 or 1, but provides the degree that object belongs to some classification, these classifiers take different thresholds
Value can be obtained by the whole confidence level that different classification results need to be measured classifier with dynamic index.
ROC (receiver operating characteristic) curve describes in confusion matrix FPR and TPR two
The relativeness of index rate of rise.For the serial number of binary classification model output, the sample that will be greater than threshold value is divided into just
Class then divides negative class into less than the sample of threshold value.Reducing threshold values no doubt can recognize that more positive classes, that is, improve TPR, but simultaneously
Also more negative samples can be divided into positive class, that is, improves FPR.This change procedure can be visualized by introducing ROC curve.ROC
The confusion matrix that each puts classification results when corresponding classifier takes some threshold value on curve.
In ROC space coordinate, point (0,1) indicates that ideal sort device, ROC curve are imitated closer to point (0,1) presentation class
Fruit is better.Area under the curve (area under curve, AUC) is with a numerical value come the quality of presentation class device, the numerical value of AUC
It is exactly the size of ROC curve section below area, biggish AUC represents preferable performance, AUC=1 corresponding ideal point
Class device.
(5) simulation analysis is carried out to example with matlab software;
(5-1) determines example and its essential feature;
The initial data set that the present invention uses is adopted for 3000 power consumer 6 months power load data of certain substation
It is divided between sample 15 minutes.Power load can be mutually converted with two kinds of indexs of electricity consumption, and the two is in reflection user power utilization rule side
Face is substantially consistent, can also be using electricity consumption as the characteristic index for describing user power utilization mode.It utilizes
MATLAB7.10 is emulated.3000 power consumers include 2965 normal users and 35 abnormal users, abnormal user ratio
Example is 1.67%
(5-2) carries out simulation analysis to example using matlab software programming function
Abnormal user, i.e. inartful loss source can be quickly detected by the model known to emulation, is meeting accuracy
With on the basis of economy realize maximum likelihood detect inartful lose.
Above-described implementation example is only that preferred embodiments of the present invention will be described, not to of the invention
Range is defined, and without departing from the spirit of the design of the present invention, those of ordinary skill in the art are to technology of the invention
The various changes and improvements that scheme is made should all be fallen into the protection scope that claims of the present invention determines.
Claims (8)
1. a kind of smart grid inartful loss detection method based on unsupervised learning, it is characterised in that: including walking as follows
It is rapid:
Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas;The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original
Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data;
Step (2) is carried out using k-means clustering method is based on, by step (1) using the data set that Principal Component Analysis obtains
Cluster, and normal data is rejected, obtain abnormal data;
Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), realizes different
Regular data is precisely separating, and completes the loss detection of smart grid inartful.
2. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1,
Be characterized in that: in step 1, the original index data set includes trend indicator, mobility index, fluctuation index, the rear r month
The load sequence and all customer charge intermediate value sequences of the ratio indicator and each user of average load and all monthly average loads
The related coefficient index of column.
3. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2,
Be characterized in that: steps are as follows for the trend indicator calculating:
1) input electric power user monthly average load data collection X;
2) the simple rolling average sequence of n point of each customer charge time series A is calculated;
3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of points for including point
It Wei not a1,a2,…,au, A has v sections on F, and every section of points for including are respectively b1,b2,…,bv, then have following indexs
It calculates:
4) ascendant trend index tra and downward trend index trb is calculated
4. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2,
Be characterized in that: the mobility index refers to the first difference measurement of user power utilization mode;Include:
1) the preceding r months differences with rear r monthly average load
In formula, xn1And xn2Respectively preceding r months with a month load of rear r;
2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms
In formula, yn1And yn2The respectively coefficient sequence of front and back discrete Fourier transform in r months.
5. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2,
It is characterized in that: the fluctuation index are as follows:
1) the standard deviation sd of each user H month load sequences;
2) the standard deviation bsd_r of r month load sequence before;
3) the standard deviation esd_r of r month load sequence afterwards.
6. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1,
It is characterized in that: in step 1, original index data set being subjected to dimension specification using principal component analytical method, detailed process is such as
Under:
(1) covariance matrix is calculated
It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:
Remember former variable index are as follows:
x1,x2,…,xp (6)
Calculate covariance matrix:
∑=(Sij)p×p (7)
In formula,
(2) eigenvalue λ of Σ is found outiAnd corresponding orthogonalization unit character vector ai
The preceding m biggish eigenvalue λs of Σ1≥λ2≥…≥λm> 0 is exactly the corresponding variance of preceding m principal component, λ1Corresponding unit
Feature vector aiIt is exactly principal component FiAbout the coefficient of former variable, then i-th of principal component F of former variableiAre as follows:
Fi=aiX (8)
(3) principal component is selected
Finally to select several principal components, i.e. F1,F2,…,FmThe determination of middle m be by covariance information contribution rate of accumulative total G (m) come
It determines:
When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly the preceding m extracted
A principal component;
(4) principal component load is calculated
Principal component load is reflection principal component FiWith former variable XjBetween interrelated degree, originally Xj(j=1,2 ..., p)
All principal component FiLoad l on (i=1,2 ..., m)ij(i=1,2 ..., m;J=1,2 ..., p):
l(Zi,Xj)=λiaij(i=1,2 ..., m;J=1,2 ..., p) (10)
If using F1,F2,…,FmIndicate former variable X1,X2,…,XpM principal component, it may be assumed that
7. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1,
It is characterized in that: in step 2, k-means clustering method fundamental formular are as follows:
In formula, dist (xi,xj) indicate data point xi,xjEuclidean distance;The attribute number of D expression data object;xi,d,xj,dPoint
It Biao Shi not data point xi,xjData component;CkIndicate the class cluster center of kth class cluster;CenterkIndicate the update class of kth class cluster
Cluster center;J indicates error sum of squares criterion function;R is the class cluster domain radius of definition;
Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and rejects normal number
According to specific cluster process is as follows:
(1) initial data set X is inputted, class cluster number k is set;
(2) k point is randomly choosed in data set X as initial cluster center;
(3) using the distance of formula (12) calculating each point to cluster centre;
(4) assign data point to most like class cluster according to distance;
(5) class cluster center is updated using formula (13);
(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result;Otherwise it returns
Step (3) continues operation.
8. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1,
Be characterized in that: in step 3, the accurate description of local outlier factor detection algorithm is established on following formula basis:
Nk(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)
reach_distk(p, q)=max { k_dist (q), d (p, q) } (17)
In formula, Nk(p) it is no more than the object set of k distance for all distances to p;D (p, q) is p, the Euclidean distance of q two o'clock;
K_dist (p) is the k of object p apart from neighborhood;reach_distk(p, q) is reach distance of the object p about object q;
lrdMinpts(p) local reachability density for being object p;Nk(p) it is no more than the object set of k distance for all distances to p;LOFk
(p) indicate that the part of point p peels off factor LOF;
Detailed process is as follows:
(1) neighbour's number k is set;
(2) target outlier number m is set;
(3) input data set;
(4) distance matrix of each object is calculated;
(5) the k distance k_dist (p) of arbitrary point p is calculated;
(6) k of arbitrary point p is calculated apart from neighborhood Nk(p);
(7) the reachable density of p point is calculated;
(8) the local factor LOF that peels off is calculated;
(9) the LOF value of all the points is ranked up, exports top (m) a outlier;
Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class away from
From if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set;
In addition, to improve the detection accuracy of algorithm, in the deterministic process for carrying out outlier, it is necessary to meet two conditions:
(1) outlier screening conditions
In formula, pijTo carry out the jth item in the i-th dvielement after k mean cluster to the data set after the processing of PCA method;ni
For the data object number contained in the i-th class;CenterkFor the center of cluster;R is the domain radius of cluster;
(2) factor that peels off restrictive condition
LOF(pij)∈LOF(p)top(m) (21)
In formula, m is the number threshold value of preset detection outlier;
Comprehensive two kinds of algorithms, detailed process is as follows:
(1) raw data set is inputted, outlier presets minimum number m;
(2) PCA dimension-reduction treatment;
(3) data set after dimensionality reduction carries out k mean cluster;
(4) the data amount check n of each class cluster is calculatedi;
(5) such as fruit cluster data number ni< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D;Such as
Fruit ni> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center CenterkDistance whether be greater than such cluster
Radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, being judged as normal number
According to rejecting;
(6) it is calculated with local outlier factor detection algorithm and the factor that peels off for all data points that sort, peel off the factor
The selection result be realize inartful loss detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910066167.0A CN109740694A (en) | 2019-01-24 | 2019-01-24 | A kind of smart grid inartful loss detection method based on unsupervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910066167.0A CN109740694A (en) | 2019-01-24 | 2019-01-24 | A kind of smart grid inartful loss detection method based on unsupervised learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109740694A true CN109740694A (en) | 2019-05-10 |
Family
ID=66365880
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910066167.0A Pending CN109740694A (en) | 2019-01-24 | 2019-01-24 | A kind of smart grid inartful loss detection method based on unsupervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109740694A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264272A (en) * | 2019-06-21 | 2019-09-20 | 山东师范大学 | A kind of mobile Internet labor service crowdsourcing platform task optimal pricing prediction technique, apparatus and system |
CN110288383A (en) * | 2019-05-31 | 2019-09-27 | 国网上海市电力公司 | Group behavior power distribution network multiplexing electric abnormality detection method based on user property label |
CN110298552A (en) * | 2019-05-31 | 2019-10-01 | 国网上海市电力公司 | A kind of power distribution network individual power method for detecting abnormality of combination history electrical feature |
CN110309884A (en) * | 2019-07-05 | 2019-10-08 | 国网四川省电力公司经济技术研究院 | Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system |
CN110852384A (en) * | 2019-11-12 | 2020-02-28 | 武汉联影医疗科技有限公司 | Medical image quality detection method, device and storage medium |
CN111125470A (en) * | 2019-12-25 | 2020-05-08 | 成都康赛信息技术有限公司 | Method for improving abnormal data mining and screening |
CN111175626A (en) * | 2020-03-20 | 2020-05-19 | 广东电网有限责任公司 | Abnormal detection method for insulation state of switch cabinet |
CN112000655A (en) * | 2020-08-26 | 2020-11-27 | 广东电网有限责任公司广州供电局 | Transformer load data preprocessing method, device and equipment |
CN112101765A (en) * | 2020-09-08 | 2020-12-18 | 国网山东省电力公司菏泽供电公司 | Abnormal data processing method and system for operation index data of power distribution network |
CN112230056A (en) * | 2020-09-07 | 2021-01-15 | 国网河南省电力公司电力科学研究院 | Multi-harmonic source contribution calculation method based on OFMMK-Means clustering and composite quantile regression |
CN112380992A (en) * | 2020-11-13 | 2021-02-19 | 上海交通大学 | Method and device for evaluating and optimizing accuracy of monitoring data in machining process |
CN112464289A (en) * | 2020-12-11 | 2021-03-09 | 广东工业大学 | Method for cleaning private data |
CN112966567A (en) * | 2021-02-05 | 2021-06-15 | 深圳市品致信息科技有限公司 | Coordinate positioning method, system, storage medium and terminal based on PCA (principal component analysis), clustering and K nearest neighbor |
CN113723497A (en) * | 2021-08-26 | 2021-11-30 | 广西大学 | Abnormal electricity utilization detection method, device, equipment and storage medium based on mixed feature extraction and Stacking model |
CN115511106A (en) * | 2022-11-15 | 2022-12-23 | 阿里云计算有限公司 | Method, device and readable storage medium for generating training data based on time sequence data |
CN116910593A (en) * | 2023-09-14 | 2023-10-20 | 北京豪迈生物工程股份有限公司 | Signal noise suppression method and system for chemiluminescent instrument |
CN117808497A (en) * | 2024-03-01 | 2024-04-02 | 清华四川能源互联网研究院 | Electric power carbon emission abnormity detection module and method based on distance and direction characteristics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106707233A (en) * | 2017-03-03 | 2017-05-24 | 广东工业大学 | Multi-side positioning method and multi-side positioning device based on outlier detection |
CN108593990A (en) * | 2018-06-04 | 2018-09-28 | 国网天津市电力公司 | A kind of stealing detection method and application based on electric power users electricity consumption behavior pattern |
CN109146705A (en) * | 2018-07-02 | 2019-01-04 | 昆明理工大学 | A kind of method of electricity consumption characteristic index dimensionality reduction and the progress stealing detection of extreme learning machine algorithm |
CN109255726A (en) * | 2018-09-07 | 2019-01-22 | 中国电建集团华东勘测设计研究院有限公司 | A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology |
-
2019
- 2019-01-24 CN CN201910066167.0A patent/CN109740694A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106707233A (en) * | 2017-03-03 | 2017-05-24 | 广东工业大学 | Multi-side positioning method and multi-side positioning device based on outlier detection |
CN108593990A (en) * | 2018-06-04 | 2018-09-28 | 国网天津市电力公司 | A kind of stealing detection method and application based on electric power users electricity consumption behavior pattern |
CN109146705A (en) * | 2018-07-02 | 2019-01-04 | 昆明理工大学 | A kind of method of electricity consumption characteristic index dimensionality reduction and the progress stealing detection of extreme learning machine algorithm |
CN109255726A (en) * | 2018-09-07 | 2019-01-22 | 中国电建集团华东勘测设计研究院有限公司 | A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology |
Non-Patent Citations (4)
Title |
---|
刘广聪: ""一种基于离群点检测的定位算法"", 《计算机应用于软件》 * |
孙毅等: ""基于高斯核函数改进的电力用户用电数据离群点检测方法"", 《电网技术》 * |
庄池杰等: ""基于无监督学习的电力用户异常用电模式检测"", 《中国电机工程学报》 * |
陶晶: ""基于聚类和密度的离群点检测方法"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288383A (en) * | 2019-05-31 | 2019-09-27 | 国网上海市电力公司 | Group behavior power distribution network multiplexing electric abnormality detection method based on user property label |
CN110298552A (en) * | 2019-05-31 | 2019-10-01 | 国网上海市电力公司 | A kind of power distribution network individual power method for detecting abnormality of combination history electrical feature |
CN110288383B (en) * | 2019-05-31 | 2024-02-02 | 国网上海市电力公司 | Group behavior power distribution network electricity utilization abnormality detection method based on user attribute tags |
CN110298552B (en) * | 2019-05-31 | 2023-12-01 | 国网上海市电力公司 | Power distribution network individual power abnormality detection method combining historical electricity utilization characteristics |
CN110264272A (en) * | 2019-06-21 | 2019-09-20 | 山东师范大学 | A kind of mobile Internet labor service crowdsourcing platform task optimal pricing prediction technique, apparatus and system |
CN110309884A (en) * | 2019-07-05 | 2019-10-08 | 国网四川省电力公司经济技术研究院 | Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system |
CN110852384B (en) * | 2019-11-12 | 2023-06-27 | 武汉联影医疗科技有限公司 | Medical image quality detection method, device and storage medium |
CN110852384A (en) * | 2019-11-12 | 2020-02-28 | 武汉联影医疗科技有限公司 | Medical image quality detection method, device and storage medium |
CN111125470A (en) * | 2019-12-25 | 2020-05-08 | 成都康赛信息技术有限公司 | Method for improving abnormal data mining and screening |
CN111175626A (en) * | 2020-03-20 | 2020-05-19 | 广东电网有限责任公司 | Abnormal detection method for insulation state of switch cabinet |
CN112000655A (en) * | 2020-08-26 | 2020-11-27 | 广东电网有限责任公司广州供电局 | Transformer load data preprocessing method, device and equipment |
CN112230056B (en) * | 2020-09-07 | 2022-04-26 | 国网河南省电力公司电力科学研究院 | Multi-harmonic-source contribution calculation method based on OFMMK-Means clustering and composite quantile regression |
CN112230056A (en) * | 2020-09-07 | 2021-01-15 | 国网河南省电力公司电力科学研究院 | Multi-harmonic source contribution calculation method based on OFMMK-Means clustering and composite quantile regression |
CN112101765A (en) * | 2020-09-08 | 2020-12-18 | 国网山东省电力公司菏泽供电公司 | Abnormal data processing method and system for operation index data of power distribution network |
CN112380992B (en) * | 2020-11-13 | 2022-12-20 | 上海交通大学 | Method and device for evaluating and optimizing accuracy of monitoring data in machining process |
CN112380992A (en) * | 2020-11-13 | 2021-02-19 | 上海交通大学 | Method and device for evaluating and optimizing accuracy of monitoring data in machining process |
CN112464289A (en) * | 2020-12-11 | 2021-03-09 | 广东工业大学 | Method for cleaning private data |
CN112966567A (en) * | 2021-02-05 | 2021-06-15 | 深圳市品致信息科技有限公司 | Coordinate positioning method, system, storage medium and terminal based on PCA (principal component analysis), clustering and K nearest neighbor |
CN113723497A (en) * | 2021-08-26 | 2021-11-30 | 广西大学 | Abnormal electricity utilization detection method, device, equipment and storage medium based on mixed feature extraction and Stacking model |
CN115511106A (en) * | 2022-11-15 | 2022-12-23 | 阿里云计算有限公司 | Method, device and readable storage medium for generating training data based on time sequence data |
CN115511106B (en) * | 2022-11-15 | 2023-04-07 | 阿里云计算有限公司 | Method, device and readable storage medium for generating training data based on time sequence data |
CN116910593B (en) * | 2023-09-14 | 2023-11-17 | 北京豪迈生物工程股份有限公司 | Signal noise suppression method and system for chemiluminescent instrument |
CN116910593A (en) * | 2023-09-14 | 2023-10-20 | 北京豪迈生物工程股份有限公司 | Signal noise suppression method and system for chemiluminescent instrument |
CN117808497A (en) * | 2024-03-01 | 2024-04-02 | 清华四川能源互联网研究院 | Electric power carbon emission abnormity detection module and method based on distance and direction characteristics |
CN117808497B (en) * | 2024-03-01 | 2024-05-14 | 清华四川能源互联网研究院 | Electric power carbon emission abnormity detection module and method based on distance and direction characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740694A (en) | A kind of smart grid inartful loss detection method based on unsupervised learning | |
Wang et al. | Detection of power grid disturbances and cyber-attacks based on machine learning | |
CN104809658B (en) | A kind of rapid analysis method of low-voltage distribution network taiwan area line loss | |
CN108133225A (en) | A kind of icing flashover fault early warning method based on support vector machines | |
CN102955902B (en) | Method and system for evaluating reliability of radar simulation equipment | |
CN106154163B (en) | Battery life state identification method | |
CN106485089B (en) | The interval parameter acquisition methods of harmonic wave user's typical condition | |
CN109446812A (en) | A kind of embedded system firmware safety analytical method and system | |
CN109039503A (en) | A kind of frequency spectrum sensing method, device, equipment and computer readable storage medium | |
CN110826618A (en) | Personal credit risk assessment method based on random forest | |
CN109787979A (en) | A kind of detection method of electric power networks event and invasion | |
CN112735097A (en) | Regional landslide early warning method and system | |
CN110569876A (en) | Non-invasive load identification method and device and computing equipment | |
CN108805193A (en) | A kind of power loss data filling method based on mixed strategy | |
CN111242161A (en) | Non-invasive non-resident user load identification method based on intelligent learning | |
CN111562541B (en) | Software platform for realizing electric energy meter detection data management by applying CART algorithm | |
CN115081933B (en) | Low-voltage user topology construction method and system based on improved spectral clustering | |
CN112463848A (en) | Method, system, device and storage medium for detecting abnormal user behavior | |
Cao et al. | Density-based fuzzy C-means multi-center re-clustering radar signal sorting algorithm | |
Frank et al. | Extracting operating modes from building electrical load data | |
Zhou et al. | Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm | |
CN114240041A (en) | Lean line loss analysis method and system for distribution network distribution area | |
CN113033898A (en) | Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network | |
Li et al. | Hierarchical clustering driven by cognitive features | |
Wang et al. | Power quality disturbance recognition method in park distribution network based on one-dimensional vggnet and multi-label classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190510 |