CN109740694A

CN109740694A - A kind of smart grid inartful loss detection method based on unsupervised learning

Info

Publication number: CN109740694A
Application number: CN201910066167.0A
Authority: CN
Inventors: 曲正伟; 李弘文; 王云静; 田亚静
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2019-01-24
Filing date: 2019-01-24
Publication date: 2019-05-10

Abstract

The invention discloses a kind of smart grid inartful loss detection method based on unsupervised learning, is related to smart grid advanced measurement system field.Raw data set is carried out dimension specification, i.e. dimension-reduction treatment using principal component analytical method by the present invention；The data after dimension-reduction treatment are clustered based on k-means method, most of normal datas are subjected to beta pruning；Precise information processing is carried out in conjunction with local outlier factor detection algorithm (LOF), it is final to realize being precisely separating for abnormal data, achieve the purpose that detect non-technical type loss；The evaluation that accuracy in detection is carried out with ROC curve, the feasibility for verifying this method can accuracy；It determines method and simulation analysis is carried out to it using emulation tool.Detection method proposed by the present invention efficient, practicability simpler and more direct than existing technology and practicality are stronger, and can more effectively improve detection efficiency, can save a large amount of time and resource.

Description

A kind of smart grid inartful loss detection method based on unsupervised learning

Technical field

The present invention relates to smart grid advanced measurement system field more particularly to a kind of intelligence electricity based on unsupervised learning Net inartful loss detection method efficiently can quickly detect the inartful loss of smart grid.

Background technique

In recent years, the development of smart grid is filled with new vitality and hope for power industry, while also to traditional power grid Mode proposes new challenge.As the increase of global resources and environmental pressure, the propulsion of electricity marketization process, user are to electric energy The factors such as the promotion of quality and electricity consumption reliability requirement make power industry face unprecedented challenge, many countries and tissue It proposes to build the smart grid with performances such as flexible, cleaning, safety, economy, close friends, and smart grid is considered as and is not sent a telegram here The developing direction of net

The basis of smart grid is number between distributed data transport, calculating and control technology and multiple power supply units According to effective transmission technology with control command.On this basis, power grid needs more efficient communication, measures system.To understand Certainly this needs produce the advanced measurement system of smart grid (Advanced Metering Infrastructure, AMI) Concept, AMI play further important role in smart grid.It is in system operation, asset management, especially load responding In significant effect achieved, be increasingly becoming research and engineering construction project most popular in entire power industry.

But so complicated detection, communication system, the security threat faced also should not be underestimated.AMI system has several keys Feature to be easy it under attack:

(1) communication system is complicated, and section communication link bandwidth is limited；

(2) accessed it is a large amount of it is low calculate, it is low storage, low protective capacities equipment；

(3) user data of a large amount of sensitivities is stored.

Criminal often utilizes security protection weakness feature under AMI system, attacks smart grid, implements to steal The illegal electricity consumption behavior such as electricity and fraud, jeopardizes the safety of smart grid, with user's stealing that is matching net side and a series of takes advantage of as this The related energy loss of deceiving property electricity consumption behavior may be collectively referred to as inartful loss (Nontechnical Loss, NTL).This measure Not only electric energy is caused to be largely lost, upset normally for electricity consumption order, while also brought to the safe operation of power grid serious Hidden danger.According to incompletely statistics, China is every year because revenue losses caused by inartful loss accounts for the 0.5% of total income and arrives 3.5%.

Currently, the measure of opposing electricity-stealing that Guo Wang power supply company takes is most are as follows: apply specialized electric energy metering box and batch meter； Low-pressure line-outgoing end is closed to the conductor of metering device, this technology is the method being most widely used in current Prevention Stealing Electricity Technology； Intelligent electric energy meter of opposing electricity-stealing, abundant electric energy table function are installed；Improve the utility ratio etc. of electric acquisition system.But these methods are most To study based on device against charge evasion, lack enough algorithms of opposing electricity-stealing for analyzing the history electricity consumption data of magnanimity, to be difficult It was found that stealing user's uses electrical feature.

In conclusion AMI system makes smart grid improve the data sampling and processing ability of smart grid, confession is strengthened To contacting for side and Demand-side.But also increase power grid risk under attack.Therefore it needs to take effective measures next pair Inartful loss is effectively detected, and effective inartful loss detection method can be the power utility check of power supply company Work provides reference, improves the hit rate of site inspection, cuts operating costs, while can save a large amount of manpower and material resources；For Promote to build strong smart grid, the safety for improving power grid has very important research significance.

Summary of the invention

It is an object of that present invention to provide a kind of the smart grid inartful loss detection method based on unsupervised learning, purport Abnormal data is being obtained by carrying out clustering to the electricity consumption initial data of characterization electricity consumption behavior, to judge that electricity consumption behavior is different Often, to reach the loss detection of smart grid inartful, this method has easy, efficient, Consideration comprehensively and practicability High feature.

To achieve the above object, the present invention is achieved by the following technical solutions: a kind of intelligence based on unsupervised learning Energy power grid inartful loss detection method, characterized by the following steps:

Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas；Choose the electricity consumption of a variety of characterization electricity consumption behaviors Raw data set is carried out dimension specification as original index data set, using principal component analytical method by initial data；

Step (2), which uses, is based on k-means clustering method, the data set that step (1) is obtained using Principal Component Analysis It is clustered, and rejects normal data, obtain abnormal data；

Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), real Existing abnormal data is precisely separating, and completes the loss detection of smart grid inartful.

A further technical solution lies in: in step 1, the original index data set includes trend indicator, mobility Index, fluctuation index, rear r monthly average load and the ratio indicator of all monthly average loads and the load sequence of each user The related coefficient index of column and all customer charge median sequences.

A further technical solution lies in: steps are as follows for the trend indicator calculating:

1) input electric power user monthly average load data collection X；

2) the simple rolling average sequence of n point of each customer charge time series A is calculated；

3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of point for including Number is respectively a₁,a₂,…,a_u, A has v sections on F, and every section of points for including are respectively b₁,b₂,…,b_v, then have following Index calculates:

4) ascendant trend index tra and downward trend index trb is calculated

A further technical solution lies in: the mobility index refers to the first difference measurement of user power utilization mode；Packet It includes:

1) the preceding r months differences with rear r monthly average load

In formula, x_n1And x_n2Respectively preceding r months with a month load of rear r；

2) the preceding r months moulds with the sequence of differences of the coefficient sequence of rear r month discrete Fourier transforms

In formula, y_n1And y_n2The respectively coefficient sequence of front and back discrete Fourier transform in r months.

A further technical solution lies in: the fluctuation index are as follows:

1) the standard deviation sd of each user H month load sequences；

2) the standard deviation bsd_r of r month load sequence before；

3) the standard deviation esd_r of r month load sequence afterwards.

A further technical solution lies in: in step 1, original index data set is carried out using principal component analytical method Dimension specification, detailed process is as follows:

(1) covariance matrix is calculated

It suppose there is n sample, each sample shares p variable, constitutes the data matrix of n × p rank:

Remember former variable index are as follows:

x₁,x₂,…,x_p (6)

Calculate covariance matrix:

∑=(S_ij)_p×p (7)

In formula,

(2) eigenvalue λ of Σ is found out_iAnd corresponding orthogonalization unit character vector a_i

The preceding m biggish eigenvalue λs of Σ₁≥λ₂≥…≥λ_m> 0 is exactly the corresponding variance of preceding m principal component, λ₁It is corresponding Unit character vector a_iIt is exactly principal component F_iAbout the coefficient of former variable, then i-th of principal component F of former variable_iAre as follows:

F_i=a_iX (8)

(3) principal component is selected

Finally to select several principal components, i.e. F₁,F₂,…,F_mThe determination of middle m is by covariance information contribution rate of accumulative total G (m) it determines:

When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly to extract Preceding m principal component；

(4) principal component load is calculated

Principal component load is reflection principal component F_iWith former variable X_jBetween interrelated degree, originally X_j(j=1,2 ..., P) in all principal component F_iLoad l on (i=1,2 ..., m)_ij(i=1,2 ..., m；J=1,2 ..., p):

l(Z_i,X_j)=λ_ia_ij(i=1,2 ..., m；J=1,2 ..., p) (10)

If using F₁,F₂,…,F_mIndicate former variable X₁,X₂,…,X_pM principal component, it may be assumed that

A further technical solution lies in: in step 2, k-means clustering method fundamental formular are as follows:

In formula, dist (x_i,x_j) indicate data point x_i,x_jEuclidean distance；The attribute number of D expression data object；x_i,d, x_j,dRespectively indicate data point x_i,x_jData component；C_kIndicate the class cluster center of kth class cluster；Center_kIndicate kth class cluster Update class cluster center；J indicates error sum of squares criterion function；R is the class cluster domain radius of definition；

Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and is rejected normal Data, specific cluster process are as follows:

(1) initial data set X is inputted, class cluster number k is set；

(2) k point is randomly choosed in data set X as initial cluster center；

(3) using the distance of formula (12) calculating each point to cluster centre；

(4) assign data point to most like class cluster according to distance；

(5) class cluster center is updated using formula (13)；

(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result；Otherwise Return step (3) continues operation.

A further technical solution lies in: in step 3, the accurate description of local outlier factor detection algorithm establish with Under on several formula basis:

N_k(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)

reach_dist_k(p, q)=max { k_dist (q), d (p, q) } (17)

In formula, N_k(p) it is no more than the object set of k distance for all distances to p；D (p, q) is p, the Euclidean of q two o'clock Distance；K_dist (p) is the k of object p apart from neighborhood；reach_dist_k(p, q) is reach distance of the object p about object q； lrd_Minpts(p) local reachability density for being object p；N_k(p) it is no more than the object set of k distance for all distances to p；LOF_k (p) indicate that the part of point p peels off factor LOF；

Detailed process is as follows:

(1) neighbour's number k is set；

(2) target outlier number m is set；

(3) input data set；

(4) distance matrix of each object is calculated；

(5) the k distance k_dist (p) of arbitrary point p is calculated；

(6) k of arbitrary point p is calculated apart from neighborhood N_k(p)；

(7) the reachable density of p point is calculated；

(8) the local factor LOF that peels off is calculated；

(9) the LOF value of all the points is ranked up, exports top (m) a outlier；

Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class Distance if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set；

In addition, to improve the detection accuracy of algorithm, in the deterministic process for carrying out outlier, it is necessary to meet two conditions:

(1) outlier screening conditions

In formula, p_ijTo carry out the jth in the i-th dvielement after k mean cluster to the data set after the processing of PCA method ?；n_iFor the data object number contained in the i-th class；Center_kFor the center of cluster；R is the domain radius of cluster；

(2) factor that peels off restrictive condition

LOF(p_ij)∈LOF(p)_top(m) (21)

In formula, m is the number threshold value of preset detection outlier；

Comprehensive two kinds of algorithms, detailed process is as follows:

(1) raw data set is inputted, outlier presets minimum number m；

(2) PCA dimension-reduction treatment；

(3) data set after dimensionality reduction carries out k mean cluster；

(4) the data amount check n of each class cluster is calculated_i；

(5) such as fruit cluster data number n_i< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D；If n_i> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center Center_kDistance whether be greater than this Class cluster radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, judgement is positive Regular data is rejected；

(6) it is calculated and the factor that peels off for all data points that sort, is peeled off with local outlier factor detection algorithm The selection result of the factor is to realize the detection of inartful loss.

Detection method proposed by the present invention is more economically convenient compared with prior art, practicability is high, and passes through two kinds of calculations The integration of method, the detection accuracy for effectively avoiding k-means method are highly dependent on the selection of clustering parameter, and outlier It is cluster process " by-product ", causes its detection accuracy comparatively not counting high；With outlier detection algorithm by comparing institute There is the LOF value of data point to judge the degree of peeling off, this generates a large amount of calculating unnecessary, cause time cost too high, simultaneously Due to intermediate result storage and the shortcomings that wasting space resource.And it is proposed by the present invention by raw data set with principal component point Analysis method carries out dimension-reduction treatment, improves the integral operation speed of algorithm；It is proposed is commented using ROC curve method progress detection accuracy Estimate, can intuitive detection method accuracy.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

The flow chart of principal component analytical method in Fig. 1 the method for the present invention；

The flow chart of k-means detection algorithm in Fig. 2 the method for the present invention；

Outlier detection method (LOF) flow chart in Fig. 3 the method for the present invention；

The Technology Roadmap of Fig. 4 the method for the present invention；

The overview flow chart of Fig. 5 the method for the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, it is described in detail below with reference to Fig. 1-5.

As illustrated in figures 4-5, a kind of smart grid inartful loss detection based on unsupervised learning that the present invention illustrates Method, the specific steps are as follows:

(1) a variety of electricity consumption datas can trigger based on an electricity consumption behavior；The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data；

Mark data set is refered in particular in (1-1) electricity consumption

Cause abnormal power information be frequently not it is isolated, a behavior may trigger a variety of exceptions.If only with single Index is then to be likely occurred omission or erroneous judgement according to being detected.Therefore, efficient anti-inartful loss work should lead to The characteristic quantity crossing and carry out comprehensive characteristics extraction to a variety of abnormal datas, and can quantify caused by the various means is unfolded.

Comprising a month electricity consumption data of N number of power consumer H in the quasi- data set extracted, user's is flat with its moon of power mode Equal load indicates that then the load sequence of each user can be expressed as H dimensional vector,Institute is useful Family can be expressed as data set X={ x_n, n=1,2 ..., N }.

The characteristic quantity of user power utilization mode can be further extracted on the basis of data set X.

1. trend indicator

The calculating of trend indicator is established on the basis of sequence moving average.The method of moving average is a kind of analysis time sequence The common tool of column can be divided into simple rolling average, weighted moving average and index rolling average etc..Simple rolling average is The arithmetic mean of instantaneous value of the preceding n numerical value of certain variable.If time series is expressed as { A₁A₂,…,A_n, then the n point of t moment is mobile flat Mean value is F_t={ A_t-1+A_t-2+…+A_t-n}/n。

Steps are as follows for trend indicator calculating:

1) input electric power user monthly average load data collection X；

4) ascendant trend index tra and downward trend index trb is calculated

2. mobility index

Mobility index refers to the first difference measurement of user power utilization mode.Include:

1) the preceding r months differences with rear r monthly average load

3. fluctuation index

1) the standard deviation sd of each user H month load sequences；

2) the standard deviation bsd_r of r month load sequence before；

3) the standard deviation esd_r of r month load sequence afterwards.

4. other indexs

1) ratio of r monthly average load and all monthly average loads afterwards；

2) related coefficient of the load sequence of each user and all customer charge median sequences.

(1-2) is based on the raw data set dimension specification of principal component analytical method (PCA)

The feature quantity of extraction is more and different characteristic may include overlay information, in order to intuitively show in low-dimensional plane Each user's with power mode and efficiently excavates abnormal user, it is necessary to carry out dimension reduction, i.e. dimension-reduction treatment to data set.Institute Meaning dimension reduction is exactly to convert to data set, indicates original data set information as much as possible with the new attribute of negligible amounts. Principal component analysis (principal component analysis, PCA) is a kind of representative dimension reduction method, specific Realization process is as follows:

(1) covariance matrix is calculated

Remember former variable index are as follows: x₁,x₂,…,x_p (6)

Calculate covariance matrix:

In formula,

F_i=a_iX (8)

(3) principal component is selected

(4) principal component load is calculated

l(Z_i,X_j)=λ_ia_ij(i=1,2 ..., m；J=1,2 ..., p) (10)

Its specific flow chart is as shown in Figure 1.

(2) cluster data based on k-means clustering method

K-means algorithm is a kind of indirect clustering method based on similarity measurement between sample, belongs to unsupervised learning side Clustering algorithm of one of the method based on division, using distance as the standard of similarity measurement between data object, i.e. data object Between distance it is smaller, then their similitude is higher, then they are more possible in same class cluster.

K-means clustering method fundamental formular are as follows:

(1) initial data set X is inputted, class cluster number k is set；

(2) k point is randomly choosed in data set X as initial cluster center；

(4) assign data point to most like class cluster according to distance；

(5) class cluster center is updated using formula (13)；

Its specific flow chart is as shown in Figure 2.

(3) it is combined based on local outlier factor detection algorithm (LOF) with k-means method and carries out precise information processing；

N_k(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)

reach_dist_k(p, q)=max { k_dist (q), d (p, q) } (17)

Detailed process is as follows:

(1) neighbour's number k is set；

(2) target outlier number m is set；

(3) input data set；

(4) distance matrix of each object is calculated；

(5) the k distance k_dist (p) of arbitrary point p is calculated；

(6) k of arbitrary point p is calculated apart from neighborhood N_k(p)；

(7) the reachable density of p point is calculated；

(8) the local factor LOF that peels off is calculated；

(9) the LOF value of all the points is ranked up, exports top (m) a outlier；

Its specific flow chart is as shown in Figure 3.

This method calls k means clustering algorithm to extract Candidate Set, wherein judgment rule are as follows: in the object and class in every class The distance of the heart is if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set.

For the detection accuracy for improving algorithm, method proposed in this paper is in the deterministic process for carrying out outlier, it is necessary to meet Two conditions:

(1) outlier screening conditions

(2) factor that peels off restrictive condition

LOF(p_ij)∈LOF(p)_top(m) (21)

In formula, m is the number threshold value of preset detection outlier；

Comprehensive two kinds of algorithms, detailed process is as follows:

(1) raw data set is inputted, outlier presets minimum number m；

(2) PCA dimension-reduction treatment；

(3) data set after dimensionality reduction carries out k mean cluster；

(4) the data amount check n of each class cluster is calculated_i；

Its specific flow chart is as shown in Figure 5.

(4) evaluation of accuracy in detection

Abnormal electricity consumption mode detection is inherently binary classification problems, i.e., all users is divided into two classes: just common Family and abnormal user.Confusion matrix is a basic tool for assessing classifier confidence level.For binary classification problems, attached drawing 4 Shown in confusion matrix show all possible classification results of classifier, wherein row (positive/negative) correspond to Classification belonging to object reality, the classification of column (true/false) presentation class device prediction.

Wherein FP is Error type I, and FN is error type II.Multiple points can be derived on the basis of confusion matrix The evaluation index of class device:

Precision ratio PRE=TP/ (TP+FP) is indicated the probability of positive example point pair；

Rate of failing to report FNR=FN/ (FN+TP) indicates the probability that positive example mistake is divided into negative example；

True positive rate TPR=TP/ (TP+FN) indicates to be correctly judged in the sample that all reality are positive as sun The ratio of property；

Pseudo- positive rate FPR=FP/ (FP+TN) indicates to be wrongly judged in the sample that all reality are negative as sun The ratio of property.

The above index measures classification results from different aspect, and there are probelem in two aspects for these indexs.Firstly, working as data set In positive and negative sample proportion imbalance when these indexs there is a problem of it is serious.It is the extreme case of 99:1 with positive and negative sample proportion For, in this case, some classifier only needs all to determine to be positive by all samples, then the accuracy rate classified is with regard to reachable 99%, but evaluation index at this time and do not have reference significance.Secondly, these belong to Static State Index, and some classifiers Exporting result is not simple 0 or 1, but provides the degree that object belongs to some classification, these classifiers take different thresholds Value can be obtained by the whole confidence level that different classification results need to be measured classifier with dynamic index.

ROC (receiver operating characteristic) curve describes in confusion matrix FPR and TPR two The relativeness of index rate of rise.For the serial number of binary classification model output, the sample that will be greater than threshold value is divided into just Class then divides negative class into less than the sample of threshold value.Reducing threshold values no doubt can recognize that more positive classes, that is, improve TPR, but simultaneously Also more negative samples can be divided into positive class, that is, improves FPR.This change procedure can be visualized by introducing ROC curve.ROC The confusion matrix that each puts classification results when corresponding classifier takes some threshold value on curve.

In ROC space coordinate, point (0,1) indicates that ideal sort device, ROC curve are imitated closer to point (0,1) presentation class Fruit is better.Area under the curve (area under curve, AUC) is with a numerical value come the quality of presentation class device, the numerical value of AUC It is exactly the size of ROC curve section below area, biggish AUC represents preferable performance, AUC=1 corresponding ideal point Class device.

(5) simulation analysis is carried out to example with matlab software；

(5-1) determines example and its essential feature；

The initial data set that the present invention uses is adopted for 3000 power consumer 6 months power load data of certain substation It is divided between sample 15 minutes.Power load can be mutually converted with two kinds of indexs of electricity consumption, and the two is in reflection user power utilization rule side Face is substantially consistent, can also be using electricity consumption as the characteristic index for describing user power utilization mode.It utilizes MATLAB7.10 is emulated.3000 power consumers include 2965 normal users and 35 abnormal users, abnormal user ratio Example is 1.67%

(5-2) carries out simulation analysis to example using matlab software programming function

Abnormal user, i.e. inartful loss source can be quickly detected by the model known to emulation, is meeting accuracy With on the basis of economy realize maximum likelihood detect inartful lose.

Above-described implementation example is only that preferred embodiments of the present invention will be described, not to of the invention Range is defined, and without departing from the spirit of the design of the present invention, those of ordinary skill in the art are to technology of the invention The various changes and improvements that scheme is made should all be fallen into the protection scope that claims of the present invention determines.

Claims

1. a kind of smart grid inartful loss detection method based on unsupervised learning, it is characterised in that: including walking as follows It is rapid:

Step (1), which is based on an electricity consumption behavior, can trigger a variety of electricity consumption datas；The electricity consumption for choosing a variety of characterization electricity consumption behaviors is original Raw data set is carried out dimension specification as original index data set, using principal component analytical method by data；

Step (2) is carried out using k-means clustering method is based on, by step (1) using the data set that Principal Component Analysis obtains Cluster, and normal data is rejected, obtain abnormal data；

Step (3) is based on local outlier factor detection algorithm and carries out precise information processing to abnormal data in step (2), realizes different Regular data is precisely separating, and completes the loss detection of smart grid inartful.

2. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, Be characterized in that: in step 1, the original index data set includes trend indicator, mobility index, fluctuation index, the rear r month The load sequence and all customer charge intermediate value sequences of the ratio indicator and each user of average load and all monthly average loads The related coefficient index of column.

3. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, Be characterized in that: steps are as follows for the trend indicator calculating:

1) input electric power user monthly average load data collection X；

3) relative size of statistical series A and sequence F at each time point, if A has u sections under F, every section of points for including point It Wei not a₁,a₂,…,a_u, A has v sections on F, and every section of points for including are respectively b₁,b₂,…,b_v, then have following indexs It calculates:

4) ascendant trend index tra and downward trend index trb is calculated

4. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, Be characterized in that: the mobility index refers to the first difference measurement of user power utilization mode；Include:

1) the preceding r months differences with rear r monthly average load

5. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 2, It is characterized in that: the fluctuation index are as follows:

1) the standard deviation sd of each user H month load sequences；

2) the standard deviation bsd_r of r month load sequence before；

3) the standard deviation esd_r of r month load sequence afterwards.

6. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, It is characterized in that: in step 1, original index data set being subjected to dimension specification using principal component analytical method, detailed process is such as Under:

(1) covariance matrix is calculated

Remember former variable index are as follows:

x₁,x₂,…,x_p (6)

Calculate covariance matrix:

∑=(S_ij)_p×p (7)

In formula,

The preceding m biggish eigenvalue λs of Σ₁≥λ₂≥…≥λ_m> 0 is exactly the corresponding variance of preceding m principal component, λ₁Corresponding unit Feature vector a_iIt is exactly principal component F_iAbout the coefficient of former variable, then i-th of principal component F of former variable_iAre as follows:

F_i=a_iX (8)

(3) principal component is selected

Finally to select several principal components, i.e. F₁,F₂,…,F_mThe determination of middle m be by covariance information contribution rate of accumulative total G (m) come It determines:

When contribution rate of accumulative total is greater than 85%, it is considered as being able to reflect the information of primal variable, corresponding m is exactly the preceding m extracted A principal component；

(4) principal component load is calculated

Principal component load is reflection principal component F_iWith former variable X_jBetween interrelated degree, originally X_j(j=1,2 ..., p) All principal component F_iLoad l on (i=1,2 ..., m)_ij(i=1,2 ..., m；J=1,2 ..., p):

l(Z_i,X_j)=λ_ia_ij(i=1,2 ..., m；J=1,2 ..., p) (10)

7. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, It is characterized in that: in step 2, k-means clustering method fundamental formular are as follows:

In formula, dist (x_i,x_j) indicate data point x_i,x_jEuclidean distance；The attribute number of D expression data object；x_i,d,x_j,dPoint It Biao Shi not data point x_i,x_jData component；C_kIndicate the class cluster center of kth class cluster；Center_kIndicate the update class of kth class cluster Cluster center；J indicates error sum of squares criterion function；R is the class cluster domain radius of definition；

Using k-means clustering method, the data set for using Principal Component Analysis to obtain is clustered, and rejects normal number According to specific cluster process is as follows:

(1) initial data set X is inputted, class cluster number k is set；

(2) k point is randomly choosed in data set X as initial cluster center；

(4) assign data point to most like class cluster according to distance；

(5) class cluster center is updated using formula (13)；

(6) step (3) to (5) are repeated, when criterion function (14) convergence, stop cluster, and export cluster result；Otherwise it returns Step (3) continues operation.

8. a kind of smart grid inartful loss detection method based on unsupervised learning according to claim 1, Be characterized in that: in step 3, the accurate description of local outlier factor detection algorithm is established on following formula basis:

N_k(p)={ q ∈ D { p } | d (p, q)≤k_dist (p) } (16)

reach_dist_k(p, q)=max { k_dist (q), d (p, q) } (17)

In formula, N_k(p) it is no more than the object set of k distance for all distances to p；D (p, q) is p, the Euclidean distance of q two o'clock； K_dist (p) is the k of object p apart from neighborhood；reach_dist_k(p, q) is reach distance of the object p about object q； lrd_Minpts(p) local reachability density for being object p；N_k(p) it is no more than the object set of k distance for all distances to p；LOF_k (p) indicate that the part of point p peels off factor LOF；

Detailed process is as follows:

(1) neighbour's number k is set；

(2) target outlier number m is set；

(3) input data set；

(4) distance matrix of each object is calculated；

(5) the k distance k_dist (p) of arbitrary point p is calculated；

(6) k of arbitrary point p is calculated apart from neighborhood N_k(p)；

(7) the reachable density of p point is calculated；

(8) the local factor LOF that peels off is calculated；

(9) the LOF value of all the points is ranked up, exports top (m) a outlier；

Call k means clustering algorithm to extract Candidate Set herein, wherein judgment rule are as follows: object and class center in every class away from From if it is larger than or equal to such radius R, then corresponding data object is extracted, as outlier Candidate Set；

(1) outlier screening conditions

In formula, p_ijTo carry out the jth item in the i-th dvielement after k mean cluster to the data set after the processing of PCA method；n_i For the data object number contained in the i-th class；Center_kFor the center of cluster；R is the domain radius of cluster；

(2) factor that peels off restrictive condition

LOF(p_ij)∈LOF(p)_top(m) (21)

In formula, m is the number threshold value of preset detection outlier；

Comprehensive two kinds of algorithms, detailed process is as follows:

(1) raw data set is inputted, outlier presets minimum number m；

(2) PCA dimension-reduction treatment；

(3) data set after dimensionality reduction carries out k mean cluster；

(4) the data amount check n of each class cluster is calculated_i；

(5) such as fruit cluster data number n_i< m then directly retains such cluster, and the data set that the class cluster after reservation includes is denoted as D；Such as Fruit n_i> m is then needed according to according to formula (20), judging in class cluster each point to class cluster center Center_kDistance whether be greater than such cluster Radius, if it is greater, then merging with data set D becomes " outlier candidate data set " D', if it is less, being judged as normal number According to rejecting；

(6) it is calculated with local outlier factor detection algorithm and the factor that peels off for all data points that sort, peel off the factor The selection result be realize inartful loss detection.