CN111046913A

CN111046913A - Load abnormal value identification method

Info

Publication number: CN111046913A
Application number: CN201911125386.8A
Authority: CN
Inventors: 李静; 程波; 王亮; 张世桃
Original assignee: Hangzhou Haixing Zeke Information Technology Co ltd; Hangzhou Renhe Information Technology Co Ltd; Nanjing Haixing Power Grid Technology Co Ltd; Hangzhou Hexing Electrical Co Ltd; Ningbo Henglida Technology Co Ltd
Current assignee: Hangzhou Haixing Zeke Information Technology Co ltd; Hangzhou Renhe Information Technology Co Ltd; Nanjing Haixing Power Grid Technology Co Ltd; Hangzhou Hexing Electrical Co Ltd; Ningbo Henglida Technology Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-04-21
Anticipated expiration: 2039-11-18
Also published as: CN111046913B

Abstract

The invention relates to the technical field of power load data mining, in particular to a load abnormal value identification method. Which comprises the following steps: classifying the load curves by using the power utilization modes; classifying the load level of the load curve belonging to the normal power utilization mode; constructing an abnormal load data domain, and identifying the abnormal value of the load curve belonging to the normal power utilization mode; and constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power utilization mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value of the load curve belonging to the abnormal power utilization mode. The method is based on the space density clustering method and the K center clustering method, can reasonably classify the load data, and is convenient to process different power utilization modes and different load levels under the same power utilization mode. The method for constructing the abnormal load data domain by the central limit theorem and the quartile range can flexibly adjust the range of the abnormal load data domain according to the required confidence degree.

Description

Load abnormal value identification method

Technical Field

The invention relates to the technical field of power load data mining, in particular to a load abnormal value identification method.

Background

The intelligent electric meter is installed so that the load record of the power consumer is changed from electric quantity to a time sequence load curve, and the power time sequence load curve contains the power consumption behavior and load level information of the user at each moment compared with the electric quantity. As an important basis for the energy supply provider to provide energy management service and the user to perform demand-side response, the abnormal load value in the power load curve often has a significant impact on the above decision. Because the acquired power load curve contains mass data, the abnormal load value is difficult to be accurately identified in a manual mode, and an abnormal load value identification method facing large-scale data needs to be developed.

Because the distribution of the power load is unknown, the box plot has a wider application in abnormal value identification. The maximum value, the minimum value, the upper quartile, the median and the lower quartile provided by the data are analyzed, and an abnormal data domain is constructed based on the quartile, so that the abnormal load value is identified. The method is simple and easy to use, and cannot be complicated due to the increase of the data scale. The main disadvantage of the box plot is that the constructed anomaly data field does not take into account the local density characteristics of the load values at each time. In order to solve the problem, the distribution characteristics met by the load data are usually assumed in advance, and then abnormal load data domains under different confidence degrees are constructed according to corresponding cumulative density functions; or verifying the distribution condition of the load data by using a distribution inspection method, and constructing abnormal load data domains under different confidence degrees by using an accumulative density function after the hypothesis is accepted.

The above solution has the following problems: for the situation of directly assuming load data distribution, because the electricity utilization behavior of the user has great randomness, the electricity utilization behavior can change along with the time lapse and the change of the external environment, and the distribution of the load data is difficult to be accurately given; for assuming distribution first and then performing distribution inspection, since there may be load abnormal values in the load data, the distribution inspection process may be affected by the abnormal data before the abnormal value identification is performed, so the inspection result is not reliable.

Disclosure of Invention

The present invention is directed to a load abnormal value identification method, so as to solve the problems in the background art.

In order to achieve the above object, in one aspect, the present invention provides a load abnormal value identification method, including the steps of:

step 1: based on a space density clustering method, classifying the power utilization modes of the load curves into a normal power utilization mode and an abnormal power utilization mode;

step 2: based on a K-center clustering method, carrying out load level classification on the normal power utilization mode;

and step 3: under different load levels, aiming at the random distribution condition of the load values at each moment, constructing an abnormal load data domain based on a central limit theorem and a four-quadrant difference of the load values relative to the clustering central load deviation;

and 4, step 4: identifying abnormal load values possibly existing in the normal electric mode by using the load abnormal data field constructed in the step 3;

and 5: and (3) combining the abnormal load data field formed in the step (3), constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power consumption mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value in the abnormal power consumption mode.

Preferably, in the step 1, the specific steps of classifying the user electricity utilization patterns based on the space density clustering method are as follows:

step 1.1: minimum number of data points N contained within a neighborhood_minSetting (2);

step 1.2: the scan radius epsilon is determined. At selected N_minThen, each load curve and the Nth position in the neighborhood are calculated_minDegree of difference in power consumption pattern between adjacent load curves

Wherein

Is calculated by the formula

In the formula y_aAnd y_bRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x₁,x₂,…,x_n)^T，x_iRepresents a load value at the ith time;

step 1.3: preparing a historical load curve set D ═ y₁,y₂,…,y_MWhere M is the total number of load curves;

step 1.4: initializing a set of core objects

The number of classes c is 0, the set of unclassified samples Λ is D, the set of class partitions

Step 1.5: for the load curve y_i(i ═ 1,2, …, M), finding a core object;

step 1.6: if core object set

Stopping classification, and entering step 1.10, otherwise, entering step 1.7;

step 1.7: initializing class serial number c as c +1, randomly selecting a core object o from the set omega, and initializing the current core object queue omega_cInitializing the current classification set S ═ o }_cUpdating an unclassified sample set Λ ═ Λ - { o };

step 1.8: if it is not

Then is present at S_cAfter generation, update S ═ S₁,S₂,…,S_c}，Ω＝Ω-S_cStep 1.6 is carried out, otherwise step 1.9 is carried out;

step 1.9: from Ω_cTaking out the core object o' to form an epsilon neighborhood sample set Z_ε(o') obtaining unclassified samples and belonging only to the set Z_ε(o') set of samples Δ ═ Z_ε(o') ∩ Λ, update Ω_c＝Ω_c∪(Δ∩Ω)-o'、S_c＝S_c∪ delta and Λ ═ Λ -delta, proceed to step 1.8;

step 1.10: output S ═ S₁,S₂,…,S_cAnd Λ.

Preferably, in step 1.5, the step of finding the core object is as follows:

①, calculating the difference degree of the power consumption modes to find y_iEpsilon field subsample set Z_ε(y_i)；

②, if Z_ε(y_i) Containing more than N samples_minThen sample y_iAdding core object set omega-omega ∪ { y_i}。

Preferably, in step 2, the specific steps of classifying the load level of the normal power consumption mode are as follows:

step 2.1: determining an optimal cluster number K such that the data point load levels within the sub data sets are highly similar, between sub data setsStarting from the angle with larger difference degree of load level, utilizing comprehensive evaluation indexes

In the clustering number K belongs to [1, K ]_max]In (1), is selected such that

Obtaining the K with the minimum value, namely, the K can be used as the expected clustering number of the subdata set,

representing the load curve y contained in the kth sub-set_jAnd cluster center

Distance between, N_kThe number of load curves contained for a subdata set;

representing the degree of difference in load level between K sub-data sets, wherein,

and

each representing a cluster center of a different sub data set;

step 2.2: and carrying out load level classification on the normal power mode by using the determined load level classification value K and using a K-center clustering method again.

Preferably, in step 2.1, the detailed steps of K-center clustering are as follows:

step 2.1.1: set of load curves S from which a load level classification is to be carried out_i(i-1, 2, …, c) randomly selecting K load curves in a centralized manner to serve as initial center points, and setting the maximum iteration times;

step 2.1.2: collecting load curves S of normal power consumption modes to be classified_iLoad curve of (1), assigned to the nearest distanceA center point of (a);

step 2.1.3: starting to execute iteration to enable the sum d of Euclidean distances corresponding to the classification results_KMinimum, which is calculated by the formula

Preferably, in step 2.1.3, the iteration is performed as follows:

step 2.1.3.1: calculating d from the assignment result_K: if the assignment is made for the first time, calculate d_KAnd is directly stored in

Of the variables, the variable holds the minimum d_K(ii) a If the assignment is not made for the first time, d is calculated_KAnd save it in a variable

Continuing to execute the step 2.1.3.2;

step 2.1.3.2: randomly selecting a non-central point;

step 2.1.3.3: creating a set C, storing the result of the iteration assignment, and directly storing the set C if the assignment is performed for the first time

In the set, the set stores the optimal classification result; if not, compare

And

if it is not

Then set C is saved in

In the set, according toModifying the result in the set C by the non-central point selected by the machine, exchanging the randomly selected non-central point with the corresponding central point, and preparing data for the next round of assignment process;

step 2.1.3.4: judging whether the specified maximum iteration times is reached, if so, terminating the calculation and outputting the optimal classification result

Otherwise, executing the next round of iterative calculation, and turning to the step 2.1.3.1.

Preferably, in the step 3, the abnormal load data domain is constructed by a method that for the kth sub-data set, at the time t, based on confidence intervals of load expected values under 1- α confidence degrees and the difference of four-component difference of the load values relative to the load deviation of the cluster center

Forming exception data fields

In the formula X_tIn order to load the random variable,

is the sample mean of the load at time t,

for time t is the corrected sample variance, N_kIs the number of samples, t_α/2(N_k-1) is a degree of freedom N_kThe upper side α/2 quantile of the t-distribution of-1.

Compared with the prior art, the invention has the beneficial effects that: the method can effectively overcome the defects that the box plot identification method does not consider the local distribution characteristic of the load value and the applicability of the method is poor due to unknown load data distribution. Compared with the method of constructing the abnormal load data domain by using a box plot method, the method based on the space density clustering method and the K center clustering method can reasonably classify the load data, and is convenient to process different power utilization modes and different load levels under the same power utilization mode. The method for constructing the abnormal load data domain by the central limit theorem and the quartile range can flexibly adjust the range of the abnormal load data domain according to the required confidence degree. On one hand, the distribution of the load values does not need to be determined in advance, and the condition that the distribution of the load data is assumed by means of expert experience is avoided; on the other hand, effective information provided by the load data is fully utilized, and the problem that the adaptability of the method to load data sets of different types of users is poor due to uncertain load data distribution is solved.

The algorithm provided by the invention conforms to the actual situation of power load data processing. With the increase of the popularity of the intelligent electric meter, the data server stores massive load data, and the load data inevitably influences the record of the normal load value due to external interference or the condition of the acquisition equipment. If the load data containing abnormal load values is directly used, errors of relevant decisions can be caused, and losses are caused. Under the condition that the utilization and distribution of load data information are unknown, the accuracy of abnormal load value identification and the applicability of the identification method to different types of data are poor in the traditional method.

When the algorithm provided by the invention faces different types of power user load data sets, the local distribution characteristics of the load data are considered, a flexible and reliable abnormal load data domain is constructed, and the algorithm can adapt to the load data sets of different types of users and the requirement of efficiently processing mass data.

Drawings

FIG. 1 is a schematic diagram illustrating the classification result of the load power consumption mode of the residential community according to the present invention;

FIG. 2 is a diagram illustrating the classification result of the load level in the normal power consumption mode according to the present invention;

FIG. 3 is a schematic diagram illustrating an abnormal load value recognition result in a normal power consumption mode according to the present invention;

fig. 4 is a schematic diagram of an abnormal load value identification result in the abnormal power consumption mode according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-4, the present invention provides a technical solution:

the invention provides a load abnormal value identification method, which comprises the following steps:

In this embodiment, in step 1, the specific steps of classifying the user power consumption modes based on the spatial density clustering method are as follows:

step 1.1: minimum number of data points N contained within a neighborhood_minThe setting of (1) is usually taken as adding 1 to the dimension of the load sample in the load curve;

Wherein

Is calculated by the formula

In the formula y_aAnd y_bRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x₁,x₂,...,x_n)^T，x_iIndicating the load value at the ith time, the degree of difference between the selected power consumption modes being changed stepwise for the first time

As the scanning radius ∈, the load data set is divided into a normal power usage mode and an abnormal power usage mode using the above parameters. Regarding the load of the residential community, the load of 24 hours per day is regarded as a load data point, so N_minWhen 25, epsilon is 0.0108;

step 1.3: preparing a historical load curve set D ═ y₁,y₂,...,y_MWhere M is the total number of load curves;

step 1.4: initializing a set of core objects

Step 1.5: for the load curve y_i(i ═ 1,2, …, M), core objects are found, as follows:

②, if Z_ε(y_i) Containing more than N samples_minThen sample y_iAdding core object set omega-omega ∪ { y_i}；

Step 1.6: if core object set

Stopping classification, and entering step 1.10, otherwise, entering step 1.7;

step 1.8: if it is not

step 1.9: from Ω_cTaking out the core object o' to form an epsilon neighborhood sample set Z_ε(o') obtaining unclassified samples and belonging only to the set Z_ε(o') set of samples Δ ═ Z_ε(o') ∩ Λ, update Ω_c＝Ω_c∪(Δ∩Ω)-o'、S_c＝S_c∪ delta and Λ ═ Λ delta, proceed to step 1.8;

step 1.10: output S ═ S₁,S₂,…,S_cAnd Λ.

The method is suitable for clustering and dividing samples of any dimensionality based on a space density clustering method, and abnormal samples are found while clustering is completed. The method is adopted to classify the electricity utilization modes, the unclassified load sample curve in the set Lambda is regarded as an abnormal electricity utilization mode curve, and the load curve in the set S is a normal electricity utilization mode. The classification is made with electrical patterns with the results shown in figure 1.

Further, in step 2, the specific steps of classifying the load level of the normal power consumption mode are as follows:

step 2.1: determining an optimal cluster number K such that the data point load levels within the sub data sets are highly similar, negative across the sub data setsStarting from the angle with larger difference degree of the load level, the comprehensive evaluation index is utilized

In the clustering number K belongs to [1, K ]_max]In (1), is selected such that

representing the load curve y contained in the kth sub-set_jAnd cluster center

Distance between, N_kThe number of load curves contained for a subdata set;

and

all represent the clustering centers of different subdata sets, wherein the detailed steps of K-center clustering are as follows:

step 2.1.2: collecting load curves S of normal power consumption modes to be classified_iThe load curve in (1), assigned to the nearest center point;

Continuing to execute the step 2.1.3.2;

step 2.1.3.2: randomly selecting a non-central point;

In the set, the set stores the optimal classification result; if not, compare

And

if it is not

Then set C is saved in

In the set, modifying the result in the set C according to the randomly selected non-central point, exchanging the randomly selected non-central point with the corresponding central point, and preparing data for the next round of assignment process;

Otherwise, executing the next round of iterative calculation, and turning to the step 2.1.3.1;

step 2.2: and carrying out load level classification on the normal power mode by using the determined load level classification value K and using a K-center clustering method again. The load level classification result of the normal electricity usage pattern for the load of the residential community is shown in fig. 2.

Still further, in the step 3, the method for constructing the abnormal load data domain comprises the step of constructing a confidence interval of load expected values under confidence 1- α and a difference of four-component of load value to cluster center load deviation at the kth sub-data set at the time t

Forming exception data fields

In the formula X_tIn order to load the random variable,

is the sample mean of the load at time t,

for time t is the corrected sample variance, N_kIs the number of samples, t_α/2(N_k-1) is a degree of freedom N_kThe upper side α/2 quantile of t distribution of-1, and t can be obtained by querying the t distribution quantile table_α/2(N_k-1), ρ being a significance indicator, with reference to the definition of an outlier cut-off point in the boxplot, the identified abnormal load value is called a mild outlier when ρ is 1.5, the identified outlier is called an extreme outlier when ρ is 3,

the load values of all load data points at the time t and the four-quantile difference of the deviation of the cluster center load values are calculated by taking the third four-quantile value as a load sample at the time t

And the first quartile value

The difference between the values of the two signals,

by using the back sampling method, firstly, repeatedly sampling the load samples at the time t, then calculating the load average value as 1 load average value sample, repeating the work for 100 times through testing, and then averaging the load average value samples again to obtain the load average value sample

Specifically, in step 4, the constructed abnormal load data domain is used to identify the abnormal load value of the load data in the normal power consumption mode, and if the load value belongs to the abnormal load data domain, the load value is considered as the abnormal load value, otherwise, the load value is the normal load value. The recognition result of the presence of an abnormal value in the normal electric pattern load curve for the load of the residential community is shown in fig. 3.

In addition, in step 5, aiming at the abnormal electricity utilization mode, the corresponding abnormal load data field is constructed by selecting the maximum upper bound and the minimum lower bound of the abnormal data field of the normal electricity utilization mode. And similarly, identifying the abnormal load value of the load data in the abnormal power utilization mode by taking the constructed abnormal load data domain as an identification basis, and if the load value belongs to the abnormal data domain, considering the load value as the abnormal load value, otherwise, considering the load value as the normal load value. The identification result of the abnormal load value existing in the abnormal electricity consumption pattern load curve for the load of the residential community is shown in fig. 4.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A load abnormal value identification method comprises the following steps:

2. The load abnormal value identification method according to claim 1, wherein: in the step 1, the specific steps of classifying the user electricity utilization modes based on the space density clustering method are as follows:

step 1.2: determining the scanning radius epsilon at a selected N_minThen, each load curve and the Nth position in the neighborhood are calculated_minDegree of difference in power consumption pattern between adjacent load curves

Wherein

Is calculated by the formula

In the formula y_aAnd y_bRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x₁,x₂,...,x_n)^T，x_iRepresents a load value at the ith time;

step 1.4: initializing a set of core objects

Step 1.5: for the load curve y_i(i ═ 1,2, …, M), finding a core object;

step 1.6: if core object set

Stopping classification and entering step 1.10, otherwise entering step 1.7;

step 1.7: initializing class serial number c as c +1, randomly selecting a core object o from the set omega, and initializing the current core object queue omega_c= o, initializing the current sorted set S_c-o, updating the set of unclassified samples Λ = Λ - { o };

step 1.8: if it is not

Then is present at S_cAfter generation, update S = { S = }₁,S₂,…,S_c}，Ω＝Ω-S_cStep 1.6 is carried out, otherwise step 1.9 is carried out;

step 1.10: output S ═ S₁,S₂,…,S_cAnd Λ.

3. The load abnormal value identification method according to claim 2, wherein: in step 1.5, the step of searching for a core object is as follows:

① calculating power consumption pattern difference degree to find y_iEpsilon field subsample set Z_ε(y_i)；

② if Z_ε(y_i) Containing more than N samples_minThen sample y_iAdding core object set omega-omega ∪ { y_i}。

4. The load abnormal value identification method according to claim 1, wherein: in step 2, the specific steps of classifying the load level of the normal electricity utilization mode are as follows:

step 2.1: determining the optimal cluster number K, and utilizing the comprehensive evaluation index from the angles of high similarity of the load levels of the data points in the sub data sets and large difference of the load levels among the sub data sets

In the clustering number K belongs to [1, K ]_max]In (1), is selected such that

representing the load curve y contained in the kth sub-set_jAnd cluster center

Distance between, N_kThe number of load curves contained for a subdata set;

and

each representing a cluster center of a different sub data set;

5. The load abnormal value identification method according to claim 4, wherein: in step 2.1, the detailed steps of K center clustering are as follows:

6. The load abnormal value identification method according to claim 5, wherein: in step 2.1.3, the specific steps of performing iteration are as follows:

Continuing to execute the step 2.1.3.2;

step 2.1.3.2: randomly selecting a non-central point;

In the set, the set stores the optimal classification result; if not, compare

And

if it is not

Then set C is saved in

In the set, modifying the result in the set C according to the randomly selected non-central point, and intersecting the randomly selected non-central point with the corresponding central pointAlternatively, data is prepared for the next round of assignment process;

7. The method for identifying load abnormal value according to claim 1, wherein in step 3, the abnormal load data domain is constructed by a confidence interval based on load expectation values under confidence 1- α and a difference of four-point differences of load values with respect to load deviation of cluster center at time t for the kth sub-data set

Forming exception data fields

In the formula X_tIn order to load the random variable,

is the sample mean of the load at time t,