CN116433049A - Power consumption abnormality detection method based on fuzzy rough entropy - Google Patents
Power consumption abnormality detection method based on fuzzy rough entropy Download PDFInfo
- Publication number
- CN116433049A CN116433049A CN202310414937.2A CN202310414937A CN116433049A CN 116433049 A CN116433049 A CN 116433049A CN 202310414937 A CN202310414937 A CN 202310414937A CN 116433049 A CN116433049 A CN 116433049A
- Authority
- CN
- China
- Prior art keywords
- attribute
- fuzzy
- user
- sequence
- entropy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005856 abnormality Effects 0.000 title claims abstract description 30
- 238000001514 detection method Methods 0.000 title claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims abstract description 77
- 230000005611 electricity Effects 0.000 claims abstract description 29
- 230000002159 abnormal effect Effects 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000000034 method Methods 0.000 claims description 15
- 239000008187 granular material Substances 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000001174 ascending effect Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 239000002245 particle Substances 0.000 claims description 3
- 230000002547 anomalous effect Effects 0.000 claims description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000005612 types of electricity Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a fuzzy rough entropy-based electricity utilization abnormality detection method, which relates to the technical field of electricity utilization abnormality detection, and specifically comprises the following steps: carrying out standardization processing on the numerical data; calculating a distance matrix of each attribute; calculating a fuzzy similarity matrix; calculating fuzzy rough entropy based on distance; sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence; calculating a distance matrix and a fuzzy similarity matrix of the attribute set sequence; respectively calculating fuzzy rough relative entropy matrixes of the two sequences; calculating weight matrixes of the two sequences; calculating an anomaly score for each user; judging whether the scores of the users are larger than a threshold value one by one, if so, outputting abnormal users until the judgment of all the users is completed. The fuzzy rough entropy can be used for effectively extracting fuzzy and uncertain information of abnormal power utilization data, so that the detection performance of a model is improved, and a mixed heterogeneous data set can be effectively processed.
Description
Technical Field
The invention relates to the technical field of electricity utilization abnormality detection, in particular to an electricity utilization abnormality detection method based on fuzzy rough entropy.
Background
The clothes and food residence of people are not powered on, and the electricity safety problem of the clothes and food residence also brings critical influence to the life and national power industry of people. At present, when the power demand continuously rises, various equipment faults and electricity stealing behaviors continuously occur, and huge hidden trouble is brought to the safe operation of a power grid. Along with the popularization of intelligent electric meters in China in the whole aspect, the data volume of an electricity consumption information acquisition system is continuously increased. But due to the lack of means for intelligently analyzing data, useful information cannot be quickly extracted from the mass data. Therefore, the electricity consumption data anomaly detection technology not only can provide better guarantee for the safe operation of the power grid, but also can avoid the occurrence of larger hidden danger for the power company so as to recover huge economic loss. The traditional power consumption abnormality detection method is mainly based on a data driving method and is divided into classification, regression and clustering algorithms. The classification algorithm classifies a user set into two types of normal and abnormal according to the characteristic quantity of the user, and a model of the user set is generally constructed by relying on a labeled data set. However, these data sets require manual tag verification and therefore consume a lot of manpower; the regression algorithm is mainly used for predicting short-term electricity consumption of the user, and comparing the predicted value, the actual electricity consumption and the predicted value so as to judge whether the abnormal situation occurs. In practical application, the regression model often has the problem that the prediction accuracy is not ideal enough. In particular, since the power consumption pattern varies from person to person, a separate regression model needs to be built for each user at the time of abnormality detection. The building and storing of the model usually also takes a lot of time and overhead of memory space; the clustering method is to divide similar objects into different groups by a classification method, so that objects in the same group have certain similarity in some attributes, and have certain difference in the attributes with objects in other groups, and in the anomaly detection, a few users which do not accord with the electricity consumption behaviors of a plurality of users can be judged as abnormal electricity consumers. However, the performance of the clustering model depends on the selection of parameters to a great extent, and the model is subjected to parameter reselection during season change due to different requirements of electricity consumption rules in different seasons, so that the clustering model is complex in calculation and is not suitable for detecting a large amount of high-dimensional electricity consumption data, and has a certain limitation.
The method is mainly suitable for numerical data, and when the method is applied to a numerical attribute data set, discretization processing is needed to be carried out on the data, so that the time consumption of data preprocessing is obviously increased, and the information loss of the data is easily caused in the processing process, so that the detection performance of a model is influenced. However, the traditional power consumption anomaly detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, and the use effect is not ideal under the condition of poor model detection performance.
Disclosure of Invention
The invention aims to solve the defects that the traditional power consumption abnormality detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, the detection performance of a model is poor, the use effect is not ideal, and the like.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:
s1: and carrying out standardization processing on the numerical data.
Unifying the dimensions of the numerical attribute data in the data set, normalizing the original data attribute values by adopting a minimum-maximum normalization method, and changing the value range of the attributes into [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the For a pair ofThe calculation formula is as follows:
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c.
S2: a distance matrix for each attribute is calculated.
Calculating distance matrix Dis of data set under each attribute c ,Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c; wherein if c is a nominal attribute, a Hamming distance is used to measure the difference between the attribute values of the two users; if c is a numerical attribute, the Euclidean distance is adopted for measurement.
S3: and calculating a fuzzy similarity matrix according to the distance matrix.
After the distance matrix is calculated, the mixed fuzzy similarity matrix of the corresponding attribute is calculated according to the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;
wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
from fuzzy similarity relationship R B Elicited about user u i Is defined as:
SIM RB (u i )=SIM B (u i )={R B (u i ,u 1 ),R B (u i ,u 2 ),…,R B (u i ,u n )}
for any integer j,0 < j.ltoreq.m, the j-th element R in the grain B (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to B; can be analyzed to find that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of fuzzy information granule SIM RB (u i ) The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if R B (u i ,u j ) =1, then specify u j Must belong to SIM RB (u i )。
S4: and calculating fuzzy rough entropy based on the distance.
Calculating fuzzy rough entropy under each attribute according to the calculated fuzzy similarity; the fuzzy rough entropy based on distance is defined as follows:
wherein E (B) =e (R B ) Representing fuzzy similarity relationship R B Is used for the fuzzy rough entropy; SIM (subscriber identity Module) RB (u i ) Representing user u i In fuzzy similarity relation R B Is used for the fuzzy information granule of (a),the |u| represents the number of users in the user set U.
S5: and obtaining an attribute sequence and an attribute set sequence according to the fuzzy rough entropy.
The attributes are arranged in ascending order according to the fuzzy rough entropy of each attribute to obtain an attribute sequence as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then adding an attribute from the attribute set C forward to obtain a new element until all the attributes are added to make the attribute set C as the last element of the sequence, finally obtaining the genusThe sex set sequence is as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6: a distance matrix, a similarity matrix, is calculated for each attribute subset of the sequence of attribute sets.
After the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i And (3) taking the value on the kth attribute in the B. The difference between nominal attribute values is measured by using Hamming distance and is symbolizedThe representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->User u i And u j The difference calculation formula on the attribute set B is Dis B (u i ,u j )=Dis B1 (u i ,u j )+Dis B2 (u i ,u j )。
S7: the fuzzy coarse relative entropy matrix under each sub-attribute set of the two sequences is calculated respectively.
The relative entropy of the fuzzy asperity is defined as follows:
wherein RE B (u) represents the fuzzy coarse relative entropy of user u with respect to attribute subset B;u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute or attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperity of (2)REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is the fuzzy rough relative entropy->
S8: and calculating weight matrixes of the two sequences.
In addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrixW S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9: and calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
Wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; the number of attributes in the attribute set C is represented by C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a weight of (a).
S10: and judging whether the anomaly score is larger than a threshold value one by one.
Given a threshold μ, for any user U ε U, if score (U) > μ, then it is determined that the user is an anomalous user in user set U.
In the invention, the fuzzy rough entropy is constructed by utilizing a fuzzy similarity relation based on distance measurement, the degree of abnormality of a user is analyzed by utilizing fuzzy rough relative entropy, and more useful information is obtained by constructing an attribute sequence and an attribute set sequence;
according to the electricity consumption abnormality detection method based on the fuzzy rough entropy, the problem that the similarity cannot be effectively distinguished under the condition of high dimensionality due to the adoption of the cross operation in the existing fuzzy rough entropy is solved, a model is not required to be trained in advance, the workload of operators is reduced, the problem of abnormality of electricity consumption data is found early, and the detection efficiency is improved;
the invention can effectively extract the fuzzy and uncertain information of the abnormal electricity consumption data by utilizing the improved fuzzy rough entropy, thereby improving the detection performance of the model, expanding the traditional fuzzy rough set method, effectively processing the mixed heterogeneous data set, being not influenced by the type of the data set and being applicable to various types of electricity consumption data.
Drawings
Fig. 1 is a schematic flow chart of steps of an electricity utilization abnormality detection method based on fuzzy rough entropy.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1
Referring to fig. 1, an embodiment provided by the present scheme: an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:
s1: acquiring electricity utilization data, and carrying out standardization processing on a numerical object in the electricity utilization data to obtain an information system;
s2: calculating a distance matrix under each attribute according to the information system;
s3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix;
s4: calculating a fuzzy rough entropy based on the distance of each attribute;
s5: sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence;
s6: calculating a distance matrix and a fuzzy similarity matrix of each attribute subset of the attribute set sequence;
s7: respectively calculating fuzzy rough relative entropy matrixes under each sub-attribute set of the two sequences;
s8: calculating weight matrixes of the two sequences;
s9: calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix;
s10: judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.
In the practice of fuzzy asperities, electricity usage data is imported into an information system (or information table), where each row represents a user and each column represents an attribute of an object; the value of the attribute may include a numerical value (such as electricity consumption, voltage, etc.), a nominal value (such as user type, connection mode), a hybrid value (including a numerical value and a nominal value), etc.; an information system is denoted (U, C), where U represents a set of users and C represents a set of attributes;
in the embodiment, firstly, a minimum-maximum standardized method is utilized to preprocess numerical value attributes to obtain an electricity data set, the advantage of the improved fuzzy rough entropy energy to effectively reflect the distinguishing degree of the electricity data is utilized to reconstruct an attribute sequence and an attribute set sequence, and the abnormal degree of a user is evaluated by constructing an abnormal score by utilizing fuzzy rough relative entropy and a weight matrix corresponding to the two sequences, so that the problem that how to process complex electricity data with the characteristics of multi-source isomerism, fuzzy, uncertainty and the like cannot be effectively solved by the existing electricity abnormality detection method is solved; the method can perform model training without marking data, and can effectively realize the unsupervised anomaly detection of the hybrid power consumption data.
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c;
in this embodiment, the value range of the numerical data is adjusted to the real number range of 0-1 by the min-max normalization operation, and the nominal data is kept unchanged.
S2 specifically comprises the following steps:
the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c c The method comprises the steps of carrying out a first treatment on the surface of the Wherein Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c.
S3 specifically comprises the following steps:
after the distance matrix is calculated, calculating a mixed fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C; when the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users; and can be analytically found that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of (1) fuzzy information particlesConversely, if R B (u i ,u j ) =1, then specify u j Must belong to
Wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
s4 specifically comprises the following steps:
calculating a fuzzy rough entropy based on the distance of each attribute;
wherein the SIM is Rc (u i )=SIM c (u i )={R c (u i ,u 1 ),R c (u i ,u 2 ),…,R c (u i ,u n )},Representing user u i With respect to R c Is a fuzzy information granule; for any integer j,0 < j.ltoreq.m, the j-th element R in the grain c (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to c; furthermore, the->The |u| represents the number of users of the user set U.
S5 specifically comprises the following steps:
the attributes are arranged in ascending order according to the size of the fuzzy rough entropy, and an attribute sequence is obtained firstly as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then a new element is obtained by positively adding an attribute from the attribute set C until all the attributes are added to make the attribute set C as the last element of the sequence, and finally the attribute set sequence is obtained as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6 is specifically as follows:
after the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i And (3) taking the value on the kth attribute in the B. The difference between nominal attribute values is measured by using Hamming distance and is symbolizedThe representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->User u i And u j The difference calculation formula on the attribute set B is as follows
S7 specifically comprises the following steps:
the relative entropy of the fuzzy asperity is defined as follows:
wherein RE B (u) represents the fuzzy coarse relative entropy of user u,u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the first and second substrates are bonded together,u relative fuzzy coarse entropy RE B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute or attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
REM AS (i,j)=RE Cj (u i );
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is the fuzzy coarse relative entropy RE c′j (u i );REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Relative entropy of fuzzy asperity of (2)
S8 specifically comprises the following steps:
in addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrix W S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9 is specifically as follows:
evaluating an outlier score for each user using the distance-based outlier score;
wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; the number of attributes in the attribute set C is represented by C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a weight of (a).
S10 specifically comprises the following steps:
judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.
Example two
Referring to Table I, c 1 Is a nominal attribute, c 2 、c 3 Is a numerical attribute, x 1 ,...,x 6 Representing all users in the user set U. An embodiment provided by the scheme is as follows: a power consumption abnormality detection method based on fuzzy rough entropy comprises the following steps.
Table-electricity consumption meter
S1: and (3) carrying out standardization processing on the numerical data: using a min-max normalized formula pair c 2 、c 3 The attribute values of (2) are preprocessed, and the preprocessing results are shown in the right part of table 1.
S2: calculating a distance matrix under each attribute according to the first table; the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c c 。
S3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix; wherein the value of the parameter epsilon is 0.5.
S4: a distance-based fuzzy rough entropy of each attribute is calculated.
E(c 1 )=1.1258,E(c 2 )=1.7088,E(c 3 )=1.5457
S5: and (5) carrying out ascending arrangement on the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence.
S=<c 1 ,c 3 ,c 2 >
AS=<C 1 ,C 2 ,C 3 >=<{c 1 },{c 1 ,c 3 },{c 1 ,c 2 ,c 3 }>
S6: a distance matrix is calculated for each attribute subset of the sequence of attribute sets.
And obtaining a fuzzy similarity matrix of each sub-attribute set in the attribute set sequence according to the distance matrix.
S7: the fuzzy coarse relative entropy matrix under each sub-attribute set of the two sequences is calculated respectively.
S8: and calculating weight matrixes of the two sequences.
S9, calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
The same principle can be obtained:
score(x 2 )≈0.3722,score(x 3 )≈0.3825,
score(x 4 )≈0.3753,score(x 5 )≈0.3901,score(x 6 )≈0.3645。
s10: let μ=0.38, determine one by one if the user's outlier in the information system is largeAnd outputting abnormal users if the user is in the threshold value, otherwise, judging the next user until the judgment of all the users is completed. Finally, obtaining a user set larger than a threshold mu, namely the calculated outlier set OS= { u 1 ,u 3 }。
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.
Claims (1)
1. The power consumption abnormality detection method based on fuzzy rough entropy is characterized by comprising the following steps of:
s1: and carrying out standardization processing on the numerical data.
Unifying the dimensions of the numerical attribute data in the data set, normalizing the original data attribute values by adopting a minimum-maximum normalization method, and changing the value range of the attributes into [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the For a pair ofThe calculation formula is as follows:
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c.
S2: a distance matrix of a single attribute is calculated.
Calculating distance matrix Dis of data set under each attribute c ,Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c; wherein if c is a nominal attribute, a Hamming distance is used to measure the difference between the attribute values of the two users; if c is a numerical attribute, the Euclidean distance is adopted for measurement.
S3: and calculating a fuzzy similarity matrix of the single attribute according to the distance matrix.
After the distance matrix is calculated, calculating a fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;
wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
from fuzzy similarity relationship R B Elicited about user u i Is defined as:
for any integer j,0 < j.ltoreq.m, the j-th element R in the grain B (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to B; can be analyzed to find that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of (1) fuzzy information particlesConversely, if R B (u i ,u j ) =1, then specify u j Must belong to->
S4: and calculating fuzzy rough entropy based on the distance.
Calculating fuzzy rough entropy under each attribute according to the calculated fuzzy similarity; the fuzzy rough entropy based on distance is defined as follows:
wherein E (B) =e (R B ) Representing fuzzy similarity relationship R B Is used for the fuzzy rough entropy;representing user u i In fuzzy similarity relation R B Is (are) fuzzy information particles, ">The |u| represents the number of users in the user set U.
S5: and obtaining an attribute sequence and an attribute set sequence according to the fuzzy rough entropy.
The attributes are arranged in ascending order according to the fuzzy rough entropy of each attribute to obtain an attribute sequence as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then a new element is obtained by positively adding an attribute from the attribute set C until all the attributes are added to make the attribute set C as the last element of the sequence, and finally the attribute set sequence is obtained as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6: a distance matrix and a similarity matrix of each attribute subset in the attribute set sequence are calculated.
After the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i The value of the kth attribute in the B; the middle miningMeasuring differences between nominal attribute values by hamming distance, symbolicallyThe representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->User u i And u j The difference calculation formula on the attribute set B is as follows
S7: the fuzzy rough entropy matrix under each sub-attribute set of the two sequences is calculated separately.
The relative entropy of the fuzzy asperity is defined as follows:
wherein RE B (u) represents the fuzzy coarse relative entropy of user u with respect to attribute subset B;u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) Is of the size of (a)If the variation is large, the abnormality degree of u is higher, and the u is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperity of (2)REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is the fuzzy rough relative entropy->
S8: and calculating weight matrixes of the two sequences.
In addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrix W S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9: and calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
Wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; c represents the number of attributes in the attribute set C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) is represented byHouse u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of (2);
according to the above-described idea, the distance-based fuzzy rough anomaly score can be used as an index for determining whether a user uses electricity abnormally, i.e., the higher the anomaly score of the object u, the greater the likelihood that u is anomaly.
S10: and judging whether the anomaly score is larger than a threshold value one by one.
Given a threshold μ, for any user U ε U, if score (U) > μ, then it is determined that the user is an anomalous user in user set U.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310414937.2A CN116433049A (en) | 2023-04-18 | 2023-04-18 | Power consumption abnormality detection method based on fuzzy rough entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310414937.2A CN116433049A (en) | 2023-04-18 | 2023-04-18 | Power consumption abnormality detection method based on fuzzy rough entropy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116433049A true CN116433049A (en) | 2023-07-14 |
Family
ID=87088740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310414937.2A Pending CN116433049A (en) | 2023-04-18 | 2023-04-18 | Power consumption abnormality detection method based on fuzzy rough entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116433049A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117076991A (en) * | 2023-10-16 | 2023-11-17 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
-
2023
- 2023-04-18 CN CN202310414937.2A patent/CN116433049A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117076991A (en) * | 2023-10-16 | 2023-11-17 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
CN117076991B (en) * | 2023-10-16 | 2024-01-02 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103323749B (en) | The partial discharge diagnostic method of multi-classifier information fusion | |
US20130138651A1 (en) | System and method employing a self-organizing map load feature database to identify electric load types of different electric loads | |
CN107528823A (en) | A kind of network anomaly detection method based on improved K Means clustering algorithms | |
CN109858522A (en) | A kind of management line loss abnormality recognition method based on data mining | |
CN113962259B (en) | Multi-mode double-layer fault diagnosis method for fuel cell system | |
CN102291392B (en) | Hybrid intrusion detection method based on Bagging algorithm | |
CN112732748B (en) | Non-invasive household appliance load identification method based on self-adaptive feature selection | |
CN110795690A (en) | Wind power plant operation abnormal data detection method | |
CN116433049A (en) | Power consumption abnormality detection method based on fuzzy rough entropy | |
CN110889441A (en) | Distance and point density based substation equipment data anomaly identification method | |
CN115684939A (en) | Battery charging abnormal state monitoring method and system based on machine learning | |
CN113866455A (en) | Bridge acceleration monitoring data anomaly detection method, system and device based on deep learning | |
CN114417971A (en) | Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering | |
CN114460481A (en) | Energy storage battery thermal runaway early warning method based on Bi-LSTM and attention mechanism | |
CN117272204A (en) | Abnormal data detection method, device, storage medium and electronic equipment | |
Guo et al. | Data-driven anomaly detection using OCSVM with boundary optimzation | |
CN112949714A (en) | Fault possibility estimation method based on random forest | |
CN116611003A (en) | Transformer fault diagnosis method, device and medium | |
Pan et al. | Study on intelligent anti–electricity stealing early-warning technology based on convolutional neural networks | |
Yan et al. | Electricity theft identification algorithm based on auto-encoder neural network and random forest | |
CN115129503A (en) | Equipment fault data cleaning method and system | |
CN114066239A (en) | User power consumption abnormity detection method and device | |
Zhang et al. | Multi-feature fusion based anomaly electro-data detection in smart grid | |
CN114662613A (en) | Abnormal battery detection system and method based on elastic time series similarity network | |
CN113496440A (en) | User abnormal electricity utilization detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |