CN111046913A - Load abnormal value identification method - Google Patents

Load abnormal value identification method Download PDF

Info

Publication number
CN111046913A
CN111046913A CN201911125386.8A CN201911125386A CN111046913A CN 111046913 A CN111046913 A CN 111046913A CN 201911125386 A CN201911125386 A CN 201911125386A CN 111046913 A CN111046913 A CN 111046913A
Authority
CN
China
Prior art keywords
load
abnormal
data
value
power utilization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911125386.8A
Other languages
Chinese (zh)
Other versions
CN111046913B (en
Inventor
李静
程波
王亮
张世桃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Haixing Zeke Information Technology Co ltd
Hangzhou Renhe Information Technology Co Ltd
Nanjing Haixing Power Grid Technology Co Ltd
Hangzhou Hexing Electrical Co Ltd
Ningbo Henglida Technology Co Ltd
Original Assignee
Hangzhou Haixing Zeke Information Technology Co ltd
Hangzhou Renhe Information Technology Co Ltd
Nanjing Haixing Power Grid Technology Co Ltd
Hangzhou Hexing Electrical Co Ltd
Ningbo Henglida Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Haixing Zeke Information Technology Co ltd, Hangzhou Renhe Information Technology Co Ltd, Nanjing Haixing Power Grid Technology Co Ltd, Hangzhou Hexing Electrical Co Ltd, Ningbo Henglida Technology Co Ltd filed Critical Hangzhou Haixing Zeke Information Technology Co ltd
Priority to CN201911125386.8A priority Critical patent/CN111046913B/en
Publication of CN111046913A publication Critical patent/CN111046913A/en
Application granted granted Critical
Publication of CN111046913B publication Critical patent/CN111046913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Probability & Statistics with Applications (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of power load data mining, in particular to a load abnormal value identification method. Which comprises the following steps: classifying the load curves by using the power utilization modes; classifying the load level of the load curve belonging to the normal power utilization mode; constructing an abnormal load data domain, and identifying the abnormal value of the load curve belonging to the normal power utilization mode; and constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power utilization mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value of the load curve belonging to the abnormal power utilization mode. The method is based on the space density clustering method and the K center clustering method, can reasonably classify the load data, and is convenient to process different power utilization modes and different load levels under the same power utilization mode. The method for constructing the abnormal load data domain by the central limit theorem and the quartile range can flexibly adjust the range of the abnormal load data domain according to the required confidence degree.

Description

Load abnormal value identification method
Technical Field
The invention relates to the technical field of power load data mining, in particular to a load abnormal value identification method.
Background
The intelligent electric meter is installed so that the load record of the power consumer is changed from electric quantity to a time sequence load curve, and the power time sequence load curve contains the power consumption behavior and load level information of the user at each moment compared with the electric quantity. As an important basis for the energy supply provider to provide energy management service and the user to perform demand-side response, the abnormal load value in the power load curve often has a significant impact on the above decision. Because the acquired power load curve contains mass data, the abnormal load value is difficult to be accurately identified in a manual mode, and an abnormal load value identification method facing large-scale data needs to be developed.
Because the distribution of the power load is unknown, the box plot has a wider application in abnormal value identification. The maximum value, the minimum value, the upper quartile, the median and the lower quartile provided by the data are analyzed, and an abnormal data domain is constructed based on the quartile, so that the abnormal load value is identified. The method is simple and easy to use, and cannot be complicated due to the increase of the data scale. The main disadvantage of the box plot is that the constructed anomaly data field does not take into account the local density characteristics of the load values at each time. In order to solve the problem, the distribution characteristics met by the load data are usually assumed in advance, and then abnormal load data domains under different confidence degrees are constructed according to corresponding cumulative density functions; or verifying the distribution condition of the load data by using a distribution inspection method, and constructing abnormal load data domains under different confidence degrees by using an accumulative density function after the hypothesis is accepted.
The above solution has the following problems: for the situation of directly assuming load data distribution, because the electricity utilization behavior of the user has great randomness, the electricity utilization behavior can change along with the time lapse and the change of the external environment, and the distribution of the load data is difficult to be accurately given; for assuming distribution first and then performing distribution inspection, since there may be load abnormal values in the load data, the distribution inspection process may be affected by the abnormal data before the abnormal value identification is performed, so the inspection result is not reliable.
Disclosure of Invention
The present invention is directed to a load abnormal value identification method, so as to solve the problems in the background art.
In order to achieve the above object, in one aspect, the present invention provides a load abnormal value identification method, including the steps of:
step 1: based on a space density clustering method, classifying the power utilization modes of the load curves into a normal power utilization mode and an abnormal power utilization mode;
step 2: based on a K-center clustering method, carrying out load level classification on the normal power utilization mode;
and step 3: under different load levels, aiming at the random distribution condition of the load values at each moment, constructing an abnormal load data domain based on a central limit theorem and a four-quadrant difference of the load values relative to the clustering central load deviation;
and 4, step 4: identifying abnormal load values possibly existing in the normal electric mode by using the load abnormal data field constructed in the step 3;
and 5: and (3) combining the abnormal load data field formed in the step (3), constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power consumption mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value in the abnormal power consumption mode.
Preferably, in the step 1, the specific steps of classifying the user electricity utilization patterns based on the space density clustering method are as follows:
step 1.1: minimum number of data points N contained within a neighborhoodminSetting (2);
step 1.2: the scan radius epsilon is determined. At selected NminThen, each load curve and the Nth position in the neighborhood are calculatedminDegree of difference in power consumption pattern between adjacent load curves
Figure BDA0002276664590000021
Wherein
Figure BDA0002276664590000022
Is calculated by the formula
Figure BDA0002276664590000023
In the formula yaAnd ybRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x1,x2,…,xn)T,xiRepresents a load value at the ith time;
step 1.3: preparing a historical load curve set D ═ y1,y2,…,yMWhere M is the total number of load curves;
step 1.4: initializing a set of core objects
Figure BDA0002276664590000024
The number of classes c is 0, the set of unclassified samples Λ is D, the set of class partitions
Figure BDA0002276664590000025
Step 1.5: for the load curve yi(i ═ 1,2, …, M), finding a core object;
step 1.6: if core object set
Figure BDA0002276664590000026
Stopping classification, and entering step 1.10, otherwise, entering step 1.7;
step 1.7: initializing class serial number c as c +1, randomly selecting a core object o from the set omega, and initializing the current core object queue omegacInitializing the current classification set S ═ o }cUpdating an unclassified sample set Λ ═ Λ - { o };
step 1.8: if it is not
Figure BDA0002276664590000031
Then is present at ScAfter generation, update S ═ S1,S2,…,Sc},Ω=Ω-ScStep 1.6 is carried out, otherwise step 1.9 is carried out;
step 1.9: from ΩcTaking out the core object o' to form an epsilon neighborhood sample set Zε(o') obtaining unclassified samples and belonging only to the set Zε(o') set of samples Δ ═ Zε(o') ∩ Λ, update Ωc=Ωc∪(Δ∩Ω)-o'、Sc=Sc∪ delta and Λ ═ Λ -delta, proceed to step 1.8;
step 1.10: output S ═ S1,S2,…,ScAnd Λ.
Preferably, in step 1.5, the step of finding the core object is as follows:
①, calculating the difference degree of the power consumption modes to find yiEpsilon field subsample set Zε(yi);
②, if Zε(yi) Containing more than N samplesminThen sample yiAdding core object set omega-omega ∪ { yi}。
Preferably, in step 2, the specific steps of classifying the load level of the normal power consumption mode are as follows:
step 2.1: determining an optimal cluster number K such that the data point load levels within the sub data sets are highly similar, between sub data setsStarting from the angle with larger difference degree of load level, utilizing comprehensive evaluation indexes
Figure BDA0002276664590000032
In the clustering number K belongs to [1, K ]max]In (1), is selected such that
Figure BDA0002276664590000033
Obtaining the K with the minimum value, namely, the K can be used as the expected clustering number of the subdata set,
Figure BDA0002276664590000034
representing the load curve y contained in the kth sub-setjAnd cluster center
Figure BDA0002276664590000036
Distance between, NkThe number of load curves contained for a subdata set;
Figure BDA0002276664590000035
representing the degree of difference in load level between K sub-data sets, wherein,
Figure BDA0002276664590000041
and
Figure BDA0002276664590000042
each representing a cluster center of a different sub data set;
step 2.2: and carrying out load level classification on the normal power mode by using the determined load level classification value K and using a K-center clustering method again.
Preferably, in step 2.1, the detailed steps of K-center clustering are as follows:
step 2.1.1: set of load curves S from which a load level classification is to be carried outi(i-1, 2, …, c) randomly selecting K load curves in a centralized manner to serve as initial center points, and setting the maximum iteration times;
step 2.1.2: collecting load curves S of normal power consumption modes to be classifiediLoad curve of (1), assigned to the nearest distanceA center point of (a);
step 2.1.3: starting to execute iteration to enable the sum d of Euclidean distances corresponding to the classification resultsKMinimum, which is calculated by the formula
Figure BDA0002276664590000043
Preferably, in step 2.1.3, the iteration is performed as follows:
step 2.1.3.1: calculating d from the assignment resultK: if the assignment is made for the first time, calculate dKAnd is directly stored in
Figure BDA00022766645900000410
Of the variables, the variable holds the minimum dK(ii) a If the assignment is not made for the first time, d is calculatedKAnd save it in a variable
Figure BDA00022766645900000411
Continuing to execute the step 2.1.3.2;
step 2.1.3.2: randomly selecting a non-central point;
step 2.1.3.3: creating a set C, storing the result of the iteration assignment, and directly storing the set C if the assignment is performed for the first time
Figure BDA0002276664590000044
In the set, the set stores the optimal classification result; if not, compare
Figure BDA0002276664590000045
And
Figure BDA0002276664590000046
if it is not
Figure BDA0002276664590000047
Then set C is saved in
Figure BDA0002276664590000048
In the set, according toModifying the result in the set C by the non-central point selected by the machine, exchanging the randomly selected non-central point with the corresponding central point, and preparing data for the next round of assignment process;
step 2.1.3.4: judging whether the specified maximum iteration times is reached, if so, terminating the calculation and outputting the optimal classification result
Figure BDA0002276664590000049
Otherwise, executing the next round of iterative calculation, and turning to the step 2.1.3.1.
Preferably, in the step 3, the abnormal load data domain is constructed by a method that for the kth sub-data set, at the time t, based on confidence intervals of load expected values under 1- α confidence degrees and the difference of four-component difference of the load values relative to the load deviation of the cluster center
Figure BDA0002276664590000051
Forming exception data fields
Figure BDA0002276664590000052
In the formula XtIn order to load the random variable,
Figure BDA0002276664590000053
is the sample mean of the load at time t,
Figure BDA0002276664590000054
for time t is the corrected sample variance, NkIs the number of samples, tα/2(Nk-1) is a degree of freedom NkThe upper side α/2 quantile of the t-distribution of-1.
Compared with the prior art, the invention has the beneficial effects that: the method can effectively overcome the defects that the box plot identification method does not consider the local distribution characteristic of the load value and the applicability of the method is poor due to unknown load data distribution. Compared with the method of constructing the abnormal load data domain by using a box plot method, the method based on the space density clustering method and the K center clustering method can reasonably classify the load data, and is convenient to process different power utilization modes and different load levels under the same power utilization mode. The method for constructing the abnormal load data domain by the central limit theorem and the quartile range can flexibly adjust the range of the abnormal load data domain according to the required confidence degree. On one hand, the distribution of the load values does not need to be determined in advance, and the condition that the distribution of the load data is assumed by means of expert experience is avoided; on the other hand, effective information provided by the load data is fully utilized, and the problem that the adaptability of the method to load data sets of different types of users is poor due to uncertain load data distribution is solved.
The algorithm provided by the invention conforms to the actual situation of power load data processing. With the increase of the popularity of the intelligent electric meter, the data server stores massive load data, and the load data inevitably influences the record of the normal load value due to external interference or the condition of the acquisition equipment. If the load data containing abnormal load values is directly used, errors of relevant decisions can be caused, and losses are caused. Under the condition that the utilization and distribution of load data information are unknown, the accuracy of abnormal load value identification and the applicability of the identification method to different types of data are poor in the traditional method.
When the algorithm provided by the invention faces different types of power user load data sets, the local distribution characteristics of the load data are considered, a flexible and reliable abnormal load data domain is constructed, and the algorithm can adapt to the load data sets of different types of users and the requirement of efficiently processing mass data.
Drawings
FIG. 1 is a schematic diagram illustrating the classification result of the load power consumption mode of the residential community according to the present invention;
FIG. 2 is a diagram illustrating the classification result of the load level in the normal power consumption mode according to the present invention;
FIG. 3 is a schematic diagram illustrating an abnormal load value recognition result in a normal power consumption mode according to the present invention;
fig. 4 is a schematic diagram of an abnormal load value identification result in the abnormal power consumption mode according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the present invention provides a technical solution:
the invention provides a load abnormal value identification method, which comprises the following steps:
step 1: based on a space density clustering method, classifying the power utilization modes of the load curves into a normal power utilization mode and an abnormal power utilization mode;
step 2: based on a K-center clustering method, carrying out load level classification on the normal power utilization mode;
and step 3: under different load levels, aiming at the random distribution condition of the load values at each moment, constructing an abnormal load data domain based on a central limit theorem and a four-quadrant difference of the load values relative to the clustering central load deviation;
and 4, step 4: identifying abnormal load values possibly existing in the normal electric mode by using the load abnormal data field constructed in the step 3;
and 5: and (3) combining the abnormal load data field formed in the step (3), constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power consumption mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value in the abnormal power consumption mode.
In this embodiment, in step 1, the specific steps of classifying the user power consumption modes based on the spatial density clustering method are as follows:
step 1.1: minimum number of data points N contained within a neighborhoodminThe setting of (1) is usually taken as adding 1 to the dimension of the load sample in the load curve;
step 1.2: the scan radius epsilon is determined. At selected NminThen, each load curve and the Nth position in the neighborhood are calculatedminDegree of difference in power consumption pattern between adjacent load curves
Figure BDA0002276664590000061
Wherein
Figure BDA0002276664590000062
Is calculated by the formula
Figure BDA0002276664590000071
In the formula yaAnd ybRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x1,x2,...,xn)T,xiIndicating the load value at the ith time, the degree of difference between the selected power consumption modes being changed stepwise for the first time
Figure BDA0002276664590000072
As the scanning radius ∈, the load data set is divided into a normal power usage mode and an abnormal power usage mode using the above parameters. Regarding the load of the residential community, the load of 24 hours per day is regarded as a load data point, so NminWhen 25, epsilon is 0.0108;
step 1.3: preparing a historical load curve set D ═ y1,y2,...,yMWhere M is the total number of load curves;
step 1.4: initializing a set of core objects
Figure BDA0002276664590000073
The number of classes c is 0, the set of unclassified samples Λ is D, the set of class partitions
Figure BDA0002276664590000074
Step 1.5: for the load curve yi(i ═ 1,2, …, M), core objects are found, as follows:
①, calculating the difference degree of the power consumption modes to find yiEpsilon field subsample set Zε(yi);
②, if Zε(yi) Containing more than N samplesminThen sample yiAdding core object set omega-omega ∪ { yi};
Step 1.6: if core object set
Figure BDA0002276664590000075
Stopping classification, and entering step 1.10, otherwise, entering step 1.7;
step 1.7: initializing class serial number c as c +1, randomly selecting a core object o from the set omega, and initializing the current core object queue omegacInitializing the current classification set S ═ o }cUpdating an unclassified sample set Λ ═ Λ - { o };
step 1.8: if it is not
Figure BDA0002276664590000076
Then is present at ScAfter generation, update S ═ S1,S2,…,Sc},Ω=Ω-ScStep 1.6 is carried out, otherwise step 1.9 is carried out;
step 1.9: from ΩcTaking out the core object o' to form an epsilon neighborhood sample set Zε(o') obtaining unclassified samples and belonging only to the set Zε(o') set of samples Δ ═ Zε(o') ∩ Λ, update Ωc=Ωc∪(Δ∩Ω)-o'、Sc=Sc∪ delta and Λ ═ Λ delta, proceed to step 1.8;
step 1.10: output S ═ S1,S2,…,ScAnd Λ.
The method is suitable for clustering and dividing samples of any dimensionality based on a space density clustering method, and abnormal samples are found while clustering is completed. The method is adopted to classify the electricity utilization modes, the unclassified load sample curve in the set Lambda is regarded as an abnormal electricity utilization mode curve, and the load curve in the set S is a normal electricity utilization mode. The classification is made with electrical patterns with the results shown in figure 1.
Further, in step 2, the specific steps of classifying the load level of the normal power consumption mode are as follows:
step 2.1: determining an optimal cluster number K such that the data point load levels within the sub data sets are highly similar, negative across the sub data setsStarting from the angle with larger difference degree of the load level, the comprehensive evaluation index is utilized
Figure BDA0002276664590000081
In the clustering number K belongs to [1, K ]max]In (1), is selected such that
Figure BDA0002276664590000082
Obtaining the K with the minimum value, namely, the K can be used as the expected clustering number of the subdata set,
Figure BDA0002276664590000083
representing the load curve y contained in the kth sub-setjAnd cluster center
Figure BDA0002276664590000088
Distance between, NkThe number of load curves contained for a subdata set;
Figure BDA0002276664590000084
representing the degree of difference in load level between K sub-data sets, wherein,
Figure BDA0002276664590000085
and
Figure BDA0002276664590000086
all represent the clustering centers of different subdata sets, wherein the detailed steps of K-center clustering are as follows:
step 2.1.1: set of load curves S from which a load level classification is to be carried outi(i-1, 2, …, c) randomly selecting K load curves in a centralized manner to serve as initial center points, and setting the maximum iteration times;
step 2.1.2: collecting load curves S of normal power consumption modes to be classifiediThe load curve in (1), assigned to the nearest center point;
step 2.1.3: starting to execute iteration to enable the sum d of Euclidean distances corresponding to the classification resultsKMinimum, which is calculated by the formula
Figure BDA0002276664590000087
Step 2.1.3.1: calculating d from the assignment resultK: if the assignment is made for the first time, calculate dKAnd is directly stored in
Figure BDA0002276664590000091
Of the variables, the variable holds the minimum dK(ii) a If the assignment is not made for the first time, d is calculatedKAnd save it in a variable
Figure BDA0002276664590000092
Continuing to execute the step 2.1.3.2;
step 2.1.3.2: randomly selecting a non-central point;
step 2.1.3.3: creating a set C, storing the result of the iteration assignment, and directly storing the set C if the assignment is performed for the first time
Figure BDA0002276664590000093
In the set, the set stores the optimal classification result; if not, compare
Figure BDA0002276664590000094
And
Figure BDA0002276664590000095
if it is not
Figure BDA0002276664590000096
Then set C is saved in
Figure BDA0002276664590000097
In the set, modifying the result in the set C according to the randomly selected non-central point, exchanging the randomly selected non-central point with the corresponding central point, and preparing data for the next round of assignment process;
step 2.1.3.4: judging whether the specified maximum iteration times is reached, if so, terminating the calculation and outputting the optimal classification result
Figure BDA0002276664590000098
Otherwise, executing the next round of iterative calculation, and turning to the step 2.1.3.1;
step 2.2: and carrying out load level classification on the normal power mode by using the determined load level classification value K and using a K-center clustering method again. The load level classification result of the normal electricity usage pattern for the load of the residential community is shown in fig. 2.
Still further, in the step 3, the method for constructing the abnormal load data domain comprises the step of constructing a confidence interval of load expected values under confidence 1- α and a difference of four-component of load value to cluster center load deviation at the kth sub-data set at the time t
Figure BDA0002276664590000099
Forming exception data fields
Figure BDA00022766645900000910
In the formula XtIn order to load the random variable,
Figure BDA00022766645900000911
is the sample mean of the load at time t,
Figure BDA00022766645900000912
for time t is the corrected sample variance, NkIs the number of samples, tα/2(Nk-1) is a degree of freedom NkThe upper side α/2 quantile of t distribution of-1, and t can be obtained by querying the t distribution quantile tableα/2(Nk-1), ρ being a significance indicator, with reference to the definition of an outlier cut-off point in the boxplot, the identified abnormal load value is called a mild outlier when ρ is 1.5, the identified outlier is called an extreme outlier when ρ is 3,
Figure BDA00022766645900000913
the load values of all load data points at the time t and the four-quantile difference of the deviation of the cluster center load values are calculated by taking the third four-quantile value as a load sample at the time t
Figure BDA0002276664590000101
And the first quartile value
Figure BDA0002276664590000102
The difference between the values of the two signals,
Figure BDA0002276664590000103
by using the back sampling method, firstly, repeatedly sampling the load samples at the time t, then calculating the load average value as 1 load average value sample, repeating the work for 100 times through testing, and then averaging the load average value samples again to obtain the load average value sample
Figure BDA0002276664590000104
Specifically, in step 4, the constructed abnormal load data domain is used to identify the abnormal load value of the load data in the normal power consumption mode, and if the load value belongs to the abnormal load data domain, the load value is considered as the abnormal load value, otherwise, the load value is the normal load value. The recognition result of the presence of an abnormal value in the normal electric pattern load curve for the load of the residential community is shown in fig. 3.
In addition, in step 5, aiming at the abnormal electricity utilization mode, the corresponding abnormal load data field is constructed by selecting the maximum upper bound and the minimum lower bound of the abnormal data field of the normal electricity utilization mode. And similarly, identifying the abnormal load value of the load data in the abnormal power utilization mode by taking the constructed abnormal load data domain as an identification basis, and if the load value belongs to the abnormal data domain, considering the load value as the abnormal load value, otherwise, considering the load value as the normal load value. The identification result of the abnormal load value existing in the abnormal electricity consumption pattern load curve for the load of the residential community is shown in fig. 4.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A load abnormal value identification method comprises the following steps:
step 1: based on a space density clustering method, classifying the power utilization modes of the load curves into a normal power utilization mode and an abnormal power utilization mode;
step 2: based on a K-center clustering method, carrying out load level classification on the normal power utilization mode;
and step 3: under different load levels, aiming at the random distribution condition of the load values at each moment, constructing an abnormal load data domain based on a central limit theorem and a four-quadrant difference of the load values relative to the clustering central load deviation;
and 4, step 4: identifying abnormal load values possibly existing in the normal electric mode by using the load abnormal data field constructed in the step 3;
and 5: and (3) combining the abnormal load data field formed in the step (3), constructing an abnormal load data field for identifying the negative abnormal load value in the abnormal power consumption mode by using the maximum upper limit and the minimum lower limit of the abnormal load data field, and identifying the load abnormal value in the abnormal power consumption mode.
2. The load abnormal value identification method according to claim 1, wherein: in the step 1, the specific steps of classifying the user electricity utilization modes based on the space density clustering method are as follows:
step 1.1: minimum number of data points N contained within a neighborhoodminSetting (2);
step 1.2: determining the scanning radius epsilon at a selected NminThen, each load curve and the Nth position in the neighborhood are calculatedminDegree of difference in power consumption pattern between adjacent load curves
Figure FDA0002276664580000011
Wherein
Figure FDA0002276664580000012
Is calculated by the formula
Figure FDA0002276664580000013
In the formula yaAnd ybRepresents two different daily load curves, each of which can be expressed in vector form as y ═ x1,x2,...,xn)T,xiRepresents a load value at the ith time;
step 1.3: preparing a historical load curve set D ═ y1,y2,…,yMWhere M is the total number of load curves;
step 1.4: initializing a set of core objects
Figure FDA0002276664580000014
The number of classes c is 0, the set of unclassified samples Λ is D, the set of class partitions
Figure FDA0002276664580000015
Step 1.5: for the load curve yi(i ═ 1,2, …, M), finding a core object;
step 1.6: if core object set
Figure FDA0002276664580000021
Stopping classification and entering step 1.10, otherwise entering step 1.7;
step 1.7: initializing class serial number c as c +1, randomly selecting a core object o from the set omega, and initializing the current core object queue omegac= o, initializing the current sorted set Sc-o, updating the set of unclassified samples Λ = Λ - { o };
step 1.8: if it is not
Figure FDA0002276664580000025
Then is present at ScAfter generation, update S = { S = }1,S2,…,Sc},Ω=Ω-ScStep 1.6 is carried out, otherwise step 1.9 is carried out;
step 1.9: from ΩcTaking out the core object o' to form an epsilon neighborhood sample set Zε(o') obtaining unclassified samples and belonging only to the set Zε(o') set of samples Δ ═ Zε(o') ∩ Λ, update Ωc=Ωc∪(Δ∩Ω)-o'、Sc=Sc∪ delta and Λ ═ Λ -delta, proceed to step 1.8;
step 1.10: output S ═ S1,S2,…,ScAnd Λ.
3. The load abnormal value identification method according to claim 2, wherein: in step 1.5, the step of searching for a core object is as follows:
① calculating power consumption pattern difference degree to find yiEpsilon field subsample set Zε(yi);
② if Zε(yi) Containing more than N samplesminThen sample yiAdding core object set omega-omega ∪ { yi}。
4. The load abnormal value identification method according to claim 1, wherein: in step 2, the specific steps of classifying the load level of the normal electricity utilization mode are as follows:
step 2.1: determining the optimal cluster number K, and utilizing the comprehensive evaluation index from the angles of high similarity of the load levels of the data points in the sub data sets and large difference of the load levels among the sub data sets
Figure FDA0002276664580000022
In the clustering number K belongs to [1, K ]max]In (1), is selected such that
Figure FDA0002276664580000023
Obtaining the K with the minimum value, namely, the K can be used as the expected clustering number of the subdata set,
Figure FDA0002276664580000024
representing the load curve y contained in the kth sub-setjAnd cluster center
Figure FDA00022766645800000312
Distance between, NkThe number of load curves contained for a subdata set;
Figure FDA0002276664580000031
representing the degree of difference in load level between K sub-data sets, wherein,
Figure FDA0002276664580000032
and
Figure FDA0002276664580000033
each representing a cluster center of a different sub data set;
step 2.2: and carrying out load level classification on the normal power mode by using the determined load level classification value K and using a K-center clustering method again.
5. The load abnormal value identification method according to claim 4, wherein: in step 2.1, the detailed steps of K center clustering are as follows:
step 2.1.1: set of load curves S from which a load level classification is to be carried outi(i-1, 2, …, c) randomly selecting K load curves in a centralized manner to serve as initial center points, and setting the maximum iteration times;
step 2.1.2: collecting load curves S of normal power consumption modes to be classifiediThe load curve in (1), assigned to the nearest center point;
step 2.1.3: starting to execute iteration to enable the sum d of Euclidean distances corresponding to the classification resultsKMinimum, which is calculated by the formula
Figure FDA0002276664580000034
6. The load abnormal value identification method according to claim 5, wherein: in step 2.1.3, the specific steps of performing iteration are as follows:
step 2.1.3.1: calculating d from the assignment resultK: if the assignment is made for the first time, calculate dKAnd is directly stored in
Figure FDA0002276664580000035
Of the variables, the variable holds the minimum dK(ii) a If the assignment is not made for the first time, d is calculatedKAnd save it in a variable
Figure FDA0002276664580000036
Continuing to execute the step 2.1.3.2;
step 2.1.3.2: randomly selecting a non-central point;
step 2.1.3.3: creating a set C, storing the result of the iteration assignment, and directly storing the set C if the assignment is performed for the first time
Figure FDA0002276664580000037
In the set, the set stores the optimal classification result; if not, compare
Figure FDA0002276664580000038
And
Figure FDA0002276664580000039
if it is not
Figure FDA00022766645800000310
Then set C is saved in
Figure FDA00022766645800000311
In the set, modifying the result in the set C according to the randomly selected non-central point, and intersecting the randomly selected non-central point with the corresponding central pointAlternatively, data is prepared for the next round of assignment process;
step 2.1.3.4: judging whether the specified maximum iteration times is reached, if so, terminating the calculation and outputting the optimal classification result
Figure FDA0002276664580000041
Otherwise, executing the next round of iterative calculation, and turning to the step 2.1.3.1.
7. The method for identifying load abnormal value according to claim 1, wherein in step 3, the abnormal load data domain is constructed by a confidence interval based on load expectation values under confidence 1- α and a difference of four-point differences of load values with respect to load deviation of cluster center at time t for the kth sub-data set
Figure FDA0002276664580000042
Forming exception data fields
Figure FDA0002276664580000043
In the formula XtIn order to load the random variable,
Figure FDA0002276664580000044
is the sample mean of the load at time t,
Figure FDA0002276664580000045
for time t is the corrected sample variance, NkIs the number of samples, tα/2(Nk-1) is a degree of freedom NkThe upper side α/2 quantile of the t-distribution of-1.
CN201911125386.8A 2019-11-18 2019-11-18 Load abnormal value identification method Active CN111046913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911125386.8A CN111046913B (en) 2019-11-18 2019-11-18 Load abnormal value identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911125386.8A CN111046913B (en) 2019-11-18 2019-11-18 Load abnormal value identification method

Publications (2)

Publication Number Publication Date
CN111046913A true CN111046913A (en) 2020-04-21
CN111046913B CN111046913B (en) 2023-02-14

Family

ID=70232982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911125386.8A Active CN111046913B (en) 2019-11-18 2019-11-18 Load abnormal value identification method

Country Status (1)

Country Link
CN (1) CN111046913B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529061A (en) * 2020-12-03 2021-03-19 新奥数能科技有限公司 Identification method and device for photovoltaic power abnormal data and terminal equipment
CN112686288A (en) * 2020-12-22 2021-04-20 博锐尚格科技股份有限公司 Power consumption behavior anomaly detection method and device and computer readable storage medium
CN113554117A (en) * 2021-08-16 2021-10-26 中国南方电网有限责任公司 Abnormal load data identification method and electronic equipment
CN114169631A (en) * 2021-12-15 2022-03-11 中国石油大学胜利学院 Oil field power load management and control system based on data analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874381A (en) * 2019-10-30 2020-03-10 西安交通大学 User side load data abnormal value identification method based on space density clustering

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874381A (en) * 2019-10-30 2020-03-10 西安交通大学 User side load data abnormal value identification method based on space density clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵天辉等: "基于非参数回归分析的工业负荷异常值识别与修正方法", 《电力系统自动化》 *
邓明斌等: "基于用户负荷的用电模式分析方法", 《计算机与数字工程》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529061A (en) * 2020-12-03 2021-03-19 新奥数能科技有限公司 Identification method and device for photovoltaic power abnormal data and terminal equipment
CN112529061B (en) * 2020-12-03 2024-04-16 新奥数能科技有限公司 Photovoltaic power abnormal data identification method and device and terminal equipment
CN112686288A (en) * 2020-12-22 2021-04-20 博锐尚格科技股份有限公司 Power consumption behavior anomaly detection method and device and computer readable storage medium
CN113554117A (en) * 2021-08-16 2021-10-26 中国南方电网有限责任公司 Abnormal load data identification method and electronic equipment
CN114169631A (en) * 2021-12-15 2022-03-11 中国石油大学胜利学院 Oil field power load management and control system based on data analysis
CN114169631B (en) * 2021-12-15 2022-10-25 山东石油化工学院 Oil field power load management and control system based on data analysis

Also Published As

Publication number Publication date
CN111046913B (en) 2023-02-14

Similar Documents

Publication Publication Date Title
CN111046913B (en) Load abnormal value identification method
CN112699913B (en) Method and device for diagnosing abnormal relationship of household transformer in transformer area
WO2018082523A1 (en) Load cycle mode identification method
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
CN111401460B (en) Abnormal electric quantity data identification method based on limit value learning
CN111553444A (en) Load identification method based on non-invasive load terminal data
CN110874381B (en) Spatial density clustering-based user side load data abnormal value identification method
Ma et al. Topology identification of distribution networks using a split-EM based data-driven approach
CN111242161A (en) Non-invasive non-resident user load identification method based on intelligent learning
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN111291782B (en) Accumulated load prediction method based on information accumulation k-Shape clustering algorithm
Hartmann et al. Suspicious electric consumption detection based on multi-profiling using live machine learning
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
Frank et al. Extracting operating modes from building electrical load data
CN109858667A (en) It is a kind of based on thunder and lightning weather to the short term clustering method of loading effects
CN114519651A (en) Intelligent power distribution method based on electric power big data
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
CN113408622A (en) Non-invasive load identification method and system considering characteristic quantity information expression difference
CN112528762A (en) Harmonic source identification method based on data correlation analysis
CN115508662B (en) Method for judging affiliation relationship between district ammeter and meter box
Yang et al. An electricity data cluster analysis method based on SAGA-FCM algorithm
CN111476298A (en) Power load state identification method in home and office environment
CN115545422A (en) Platform area user variation relation identification method based on improved decision mechanism
Liu et al. Research on the transformer area recognition method based on improved K-means clustering algorithm
Xie et al. Energy System Time Series Data Quality Maintenance System Based on Data Mining Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant