CN116433049A - Power consumption abnormality detection method based on fuzzy rough entropy - Google Patents

Power consumption abnormality detection method based on fuzzy rough entropy Download PDF

Info

Publication number
CN116433049A
CN116433049A CN202310414937.2A CN202310414937A CN116433049A CN 116433049 A CN116433049 A CN 116433049A CN 202310414937 A CN202310414937 A CN 202310414937A CN 116433049 A CN116433049 A CN 116433049A
Authority
CN
China
Prior art keywords
attribute
fuzzy
user
sequence
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310414937.2A
Other languages
Chinese (zh)
Inventor
王思涵
袁钟
刘昶
羊思宇
吴衍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202310414937.2A priority Critical patent/CN116433049A/en
Publication of CN116433049A publication Critical patent/CN116433049A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a fuzzy rough entropy-based electricity utilization abnormality detection method, which relates to the technical field of electricity utilization abnormality detection, and specifically comprises the following steps: carrying out standardization processing on the numerical data; calculating a distance matrix of each attribute; calculating a fuzzy similarity matrix; calculating fuzzy rough entropy based on distance; sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence; calculating a distance matrix and a fuzzy similarity matrix of the attribute set sequence; respectively calculating fuzzy rough relative entropy matrixes of the two sequences; calculating weight matrixes of the two sequences; calculating an anomaly score for each user; judging whether the scores of the users are larger than a threshold value one by one, if so, outputting abnormal users until the judgment of all the users is completed. The fuzzy rough entropy can be used for effectively extracting fuzzy and uncertain information of abnormal power utilization data, so that the detection performance of a model is improved, and a mixed heterogeneous data set can be effectively processed.

Description

Power consumption abnormality detection method based on fuzzy rough entropy
Technical Field
The invention relates to the technical field of electricity utilization abnormality detection, in particular to an electricity utilization abnormality detection method based on fuzzy rough entropy.
Background
The clothes and food residence of people are not powered on, and the electricity safety problem of the clothes and food residence also brings critical influence to the life and national power industry of people. At present, when the power demand continuously rises, various equipment faults and electricity stealing behaviors continuously occur, and huge hidden trouble is brought to the safe operation of a power grid. Along with the popularization of intelligent electric meters in China in the whole aspect, the data volume of an electricity consumption information acquisition system is continuously increased. But due to the lack of means for intelligently analyzing data, useful information cannot be quickly extracted from the mass data. Therefore, the electricity consumption data anomaly detection technology not only can provide better guarantee for the safe operation of the power grid, but also can avoid the occurrence of larger hidden danger for the power company so as to recover huge economic loss. The traditional power consumption abnormality detection method is mainly based on a data driving method and is divided into classification, regression and clustering algorithms. The classification algorithm classifies a user set into two types of normal and abnormal according to the characteristic quantity of the user, and a model of the user set is generally constructed by relying on a labeled data set. However, these data sets require manual tag verification and therefore consume a lot of manpower; the regression algorithm is mainly used for predicting short-term electricity consumption of the user, and comparing the predicted value, the actual electricity consumption and the predicted value so as to judge whether the abnormal situation occurs. In practical application, the regression model often has the problem that the prediction accuracy is not ideal enough. In particular, since the power consumption pattern varies from person to person, a separate regression model needs to be built for each user at the time of abnormality detection. The building and storing of the model usually also takes a lot of time and overhead of memory space; the clustering method is to divide similar objects into different groups by a classification method, so that objects in the same group have certain similarity in some attributes, and have certain difference in the attributes with objects in other groups, and in the anomaly detection, a few users which do not accord with the electricity consumption behaviors of a plurality of users can be judged as abnormal electricity consumers. However, the performance of the clustering model depends on the selection of parameters to a great extent, and the model is subjected to parameter reselection during season change due to different requirements of electricity consumption rules in different seasons, so that the clustering model is complex in calculation and is not suitable for detecting a large amount of high-dimensional electricity consumption data, and has a certain limitation.
The method is mainly suitable for numerical data, and when the method is applied to a numerical attribute data set, discretization processing is needed to be carried out on the data, so that the time consumption of data preprocessing is obviously increased, and the information loss of the data is easily caused in the processing process, so that the detection performance of a model is influenced. However, the traditional power consumption anomaly detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, and the use effect is not ideal under the condition of poor model detection performance.
Disclosure of Invention
The invention aims to solve the defects that the traditional power consumption abnormality detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, the detection performance of a model is poor, the use effect is not ideal, and the like.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:
s1: and carrying out standardization processing on the numerical data.
Unifying the dimensions of the numerical attribute data in the data set, normalizing the original data attribute values by adopting a minimum-maximum normalization method, and changing the value range of the attributes into [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the For a pair of
Figure BDA0004184597980000031
The calculation formula is as follows:
Figure BDA0004184597980000032
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c.
S2: a distance matrix for each attribute is calculated.
Calculating distance matrix Dis of data set under each attribute c ,Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c; wherein if c is a nominal attribute, a Hamming distance is used to measure the difference between the attribute values of the two users; if c is a numerical attribute, the Euclidean distance is adopted for measurement.
S3: and calculating a fuzzy similarity matrix according to the distance matrix.
After the distance matrix is calculated, the mixed fuzzy similarity matrix of the corresponding attribute is calculated according to the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;
Figure BDA0004184597980000033
wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
from fuzzy similarity relationship R B Elicited about user u i Is defined as:
SIM RB (u i )=SIM B (u i )={R B (u i ,u 1 ),R B (u i ,u 2 ),…,R B (u i ,u n )}
for any integer j,0 < j.ltoreq.m, the j-th element R in the grain B (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to B; can be analyzed to find that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of fuzzy information granule SIM RB (u i ) The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if R B (u i ,u j ) =1, then specify u j Must belong to SIM RB (u i )。
S4: and calculating fuzzy rough entropy based on the distance.
Calculating fuzzy rough entropy under each attribute according to the calculated fuzzy similarity; the fuzzy rough entropy based on distance is defined as follows:
Figure BDA0004184597980000041
wherein E (B) =e (R B ) Representing fuzzy similarity relationship R B Is used for the fuzzy rough entropy; SIM (subscriber identity Module) RB (u i ) Representing user u i In fuzzy similarity relation R B Is used for the fuzzy information granule of (a),
Figure BDA0004184597980000042
the |u| represents the number of users in the user set U.
S5: and obtaining an attribute sequence and an attribute set sequence according to the fuzzy rough entropy.
The attributes are arranged in ascending order according to the fuzzy rough entropy of each attribute to obtain an attribute sequence as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then adding an attribute from the attribute set C forward to obtain a new element until all the attributes are added to make the attribute set C as the last element of the sequence, finally obtaining the genusThe sex set sequence is as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6: a distance matrix, a similarity matrix, is calculated for each attribute subset of the sequence of attribute sets.
After the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
Figure BDA0004184597980000051
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i And (3) taking the value on the kth attribute in the B. The difference between nominal attribute values is measured by using Hamming distance and is symbolized
Figure BDA0004184597980000061
The representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->
Figure BDA0004184597980000062
User u i And u j The difference calculation formula on the attribute set B is Dis B (u i ,u j )=Dis B1 (u i ,u j )+Dis B2 (u i ,u j )。
S7: the fuzzy coarse relative entropy matrix under each sub-attribute set of the two sequences is calculated respectively.
The relative entropy of the fuzzy asperity is defined as follows:
Figure BDA0004184597980000063
wherein RE B (u) represents the fuzzy coarse relative entropy of user u with respect to attribute subset B;
Figure BDA0004184597980000064
u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute or attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
Figure BDA0004184597980000071
Figure BDA0004184597980000072
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperity of (2)
Figure BDA0004184597980000073
REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is the fuzzy rough relative entropy->
Figure BDA0004184597980000074
S8: and calculating weight matrixes of the two sequences.
In addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
Figure BDA0004184597980000075
Figure BDA0004184597980000076
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrixW S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),
Figure BDA0004184597980000081
representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->
Figure BDA0004184597980000082
Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9: and calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
Figure BDA0004184597980000083
Wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; the number of attributes in the attribute set C is represented by C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a weight of (a).
S10: and judging whether the anomaly score is larger than a threshold value one by one.
Given a threshold μ, for any user U ε U, if score (U) > μ, then it is determined that the user is an anomalous user in user set U.
In the invention, the fuzzy rough entropy is constructed by utilizing a fuzzy similarity relation based on distance measurement, the degree of abnormality of a user is analyzed by utilizing fuzzy rough relative entropy, and more useful information is obtained by constructing an attribute sequence and an attribute set sequence;
according to the electricity consumption abnormality detection method based on the fuzzy rough entropy, the problem that the similarity cannot be effectively distinguished under the condition of high dimensionality due to the adoption of the cross operation in the existing fuzzy rough entropy is solved, a model is not required to be trained in advance, the workload of operators is reduced, the problem of abnormality of electricity consumption data is found early, and the detection efficiency is improved;
the invention can effectively extract the fuzzy and uncertain information of the abnormal electricity consumption data by utilizing the improved fuzzy rough entropy, thereby improving the detection performance of the model, expanding the traditional fuzzy rough set method, effectively processing the mixed heterogeneous data set, being not influenced by the type of the data set and being applicable to various types of electricity consumption data.
Drawings
Fig. 1 is a schematic flow chart of steps of an electricity utilization abnormality detection method based on fuzzy rough entropy.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1
Referring to fig. 1, an embodiment provided by the present scheme: an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:
s1: acquiring electricity utilization data, and carrying out standardization processing on a numerical object in the electricity utilization data to obtain an information system;
s2: calculating a distance matrix under each attribute according to the information system;
s3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix;
s4: calculating a fuzzy rough entropy based on the distance of each attribute;
s5: sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence;
s6: calculating a distance matrix and a fuzzy similarity matrix of each attribute subset of the attribute set sequence;
s7: respectively calculating fuzzy rough relative entropy matrixes under each sub-attribute set of the two sequences;
s8: calculating weight matrixes of the two sequences;
s9: calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix;
s10: judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.
In the practice of fuzzy asperities, electricity usage data is imported into an information system (or information table), where each row represents a user and each column represents an attribute of an object; the value of the attribute may include a numerical value (such as electricity consumption, voltage, etc.), a nominal value (such as user type, connection mode), a hybrid value (including a numerical value and a nominal value), etc.; an information system is denoted (U, C), where U represents a set of users and C represents a set of attributes;
in the embodiment, firstly, a minimum-maximum standardized method is utilized to preprocess numerical value attributes to obtain an electricity data set, the advantage of the improved fuzzy rough entropy energy to effectively reflect the distinguishing degree of the electricity data is utilized to reconstruct an attribute sequence and an attribute set sequence, and the abnormal degree of a user is evaluated by constructing an abnormal score by utilizing fuzzy rough relative entropy and a weight matrix corresponding to the two sequences, so that the problem that how to process complex electricity data with the characteristics of multi-source isomerism, fuzzy, uncertainty and the like cannot be effectively solved by the existing electricity abnormality detection method is solved; the method can perform model training without marking data, and can effectively realize the unsupervised anomaly detection of the hybrid power consumption data.
In the step S1, the step of,
Figure BDA0004184597980000111
the expression of the normalization processing is as follows:
Figure BDA0004184597980000112
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c;
in this embodiment, the value range of the numerical data is adjusted to the real number range of 0-1 by the min-max normalization operation, and the nominal data is kept unchanged.
S2 specifically comprises the following steps:
the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c c The method comprises the steps of carrying out a first treatment on the surface of the Wherein Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c.
S3 specifically comprises the following steps:
after the distance matrix is calculated, calculating a mixed fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C; when the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users; and can be analytically found that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of (1) fuzzy information particles
Figure BDA0004184597980000123
Conversely, if R B (u i ,u j ) =1, then specify u j Must belong to
Figure BDA0004184597980000122
Figure BDA0004184597980000121
Wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
s4 specifically comprises the following steps:
calculating a fuzzy rough entropy based on the distance of each attribute;
Figure BDA0004184597980000131
wherein the SIM is Rc (u i )=SIM c (u i )={R c (u i ,u 1 ),R c (u i ,u 2 ),…,R c (u i ,u n )},
Figure BDA0004184597980000133
Representing user u i With respect to R c Is a fuzzy information granule; for any integer j,0 < j.ltoreq.m, the j-th element R in the grain c (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to c; furthermore, the->
Figure BDA0004184597980000132
The |u| represents the number of users of the user set U.
S5 specifically comprises the following steps:
the attributes are arranged in ascending order according to the size of the fuzzy rough entropy, and an attribute sequence is obtained firstly as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then a new element is obtained by positively adding an attribute from the attribute set C until all the attributes are added to make the attribute set C as the last element of the sequence, and finally the attribute set sequence is obtained as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6 is specifically as follows:
after the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
Figure BDA0004184597980000141
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i And (3) taking the value on the kth attribute in the B. The difference between nominal attribute values is measured by using Hamming distance and is symbolized
Figure BDA0004184597980000151
The representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->
Figure BDA0004184597980000152
User u i And u j The difference calculation formula on the attribute set B is as follows
Figure BDA0004184597980000155
S7 specifically comprises the following steps:
the relative entropy of the fuzzy asperity is defined as follows:
Figure BDA0004184597980000153
wherein RE B (u) represents the fuzzy coarse relative entropy of user u,
Figure BDA0004184597980000154
u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the first and second substrates are bonded together,u relative fuzzy coarse entropy RE B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute or attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
Figure BDA0004184597980000161
REM AS (i,j)=RE Cj (u i );
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is the fuzzy coarse relative entropy RE c′j (u i );REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Relative entropy of fuzzy asperity of (2)
Figure BDA0004184597980000162
S8 specifically comprises the following steps:
in addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
Figure BDA0004184597980000171
Figure BDA0004184597980000172
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrix W S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),
Figure BDA0004184597980000174
representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->
Figure BDA0004184597980000175
Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9 is specifically as follows:
evaluating an outlier score for each user using the distance-based outlier score;
Figure BDA0004184597980000173
wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; the number of attributes in the attribute set C is represented by C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a weight of (a).
S10 specifically comprises the following steps:
judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.
Example two
Referring to Table I, c 1 Is a nominal attribute, c 2 、c 3 Is a numerical attribute, x 1 ,...,x 6 Representing all users in the user set U. An embodiment provided by the scheme is as follows: a power consumption abnormality detection method based on fuzzy rough entropy comprises the following steps.
Table-electricity consumption meter
Figure BDA0004184597980000181
S1: and (3) carrying out standardization processing on the numerical data: using a min-max normalized formula pair c 2 、c 3 The attribute values of (2) are preprocessed, and the preprocessing results are shown in the right part of table 1.
S2: calculating a distance matrix under each attribute according to the first table; the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c c
Figure BDA0004184597980000191
Figure BDA0004184597980000192
Figure BDA0004184597980000193
S3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix; wherein the value of the parameter epsilon is 0.5.
Figure BDA0004184597980000201
Figure BDA0004184597980000202
Figure BDA0004184597980000203
S4: a distance-based fuzzy rough entropy of each attribute is calculated.
E(c 1 )=1.1258,E(c 2 )=1.7088,E(c 3 )=1.5457
S5: and (5) carrying out ascending arrangement on the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence.
S=<c 1 ,c 3 ,c 2 >
AS=<C 1 ,C 2 ,C 3 >=<{c 1 },{c 1 ,c 3 },{c 1 ,c 2 ,c 3 }>
S6: a distance matrix is calculated for each attribute subset of the sequence of attribute sets.
Figure BDA0004184597980000211
/>
Figure BDA0004184597980000212
Figure BDA0004184597980000213
And obtaining a fuzzy similarity matrix of each sub-attribute set in the attribute set sequence according to the distance matrix.
Figure BDA0004184597980000214
Figure BDA0004184597980000215
Figure BDA0004184597980000221
S7: the fuzzy coarse relative entropy matrix under each sub-attribute set of the two sequences is calculated respectively.
Figure BDA0004184597980000222
Figure BDA0004184597980000223
S8: and calculating weight matrixes of the two sequences.
Figure BDA0004184597980000224
Figure BDA0004184597980000231
S9, calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
Figure BDA0004184597980000232
The same principle can be obtained:
score(x 2 )≈0.3722,score(x 3 )≈0.3825,
score(x 4 )≈0.3753,score(x 5 )≈0.3901,score(x 6 )≈0.3645。
s10: let μ=0.38, determine one by one if the user's outlier in the information system is largeAnd outputting abnormal users if the user is in the threshold value, otherwise, judging the next user until the judgment of all the users is completed. Finally, obtaining a user set larger than a threshold mu, namely the calculated outlier set OS= { u 1 ,u 3 }。
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (1)

1. The power consumption abnormality detection method based on fuzzy rough entropy is characterized by comprising the following steps of:
s1: and carrying out standardization processing on the numerical data.
Unifying the dimensions of the numerical attribute data in the data set, normalizing the original data attribute values by adopting a minimum-maximum normalization method, and changing the value range of the attributes into [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the For a pair of
Figure FDA0004184597970000012
The calculation formula is as follows:
Figure FDA0004184597970000011
wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c.
S2: a distance matrix of a single attribute is calculated.
Calculating distance matrix Dis of data set under each attribute c ,Dis c (u i ,u j )=Dis c (u j ,u i ) Representing user u i And u j Distance on attribute c; wherein if c is a nominal attribute, a Hamming distance is used to measure the difference between the attribute values of the two users; if c is a numerical attribute, the Euclidean distance is adopted for measurement.
S3: and calculating a fuzzy similarity matrix of the single attribute according to the distance matrix.
After the distance matrix is calculated, calculating a fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;
Figure FDA0004184597970000021
wherein R is B Fuzzy similarity matrix, R, representing attribute subset B B (u i ,u j ) Representing user u i And u j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1];Dis B (u i ,u j ) Representing user u i And u j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;
from fuzzy similarity relationship R B Elicited about user u i Is defined as:
Figure FDA0004184597970000025
for any integer j,0 < j.ltoreq.m, the j-th element R in the grain B (u i ,u j ) Represented as u i And u j Fuzzy similarity with respect to B; can be analyzed to find that if R B (u i ,u j ) =0, description u j Must not belong to u i Is of (1) fuzzy information particles
Figure FDA0004184597970000023
Conversely, if R B (u i ,u j ) =1, then specify u j Must belong to->
Figure FDA0004184597970000024
S4: and calculating fuzzy rough entropy based on the distance.
Calculating fuzzy rough entropy under each attribute according to the calculated fuzzy similarity; the fuzzy rough entropy based on distance is defined as follows:
Figure FDA0004184597970000022
wherein E (B) =e (R B ) Representing fuzzy similarity relationship R B Is used for the fuzzy rough entropy;
Figure FDA0004184597970000032
representing user u i In fuzzy similarity relation R B Is (are) fuzzy information particles, ">
Figure FDA0004184597970000031
The |u| represents the number of users in the user set U.
S5: and obtaining an attribute sequence and an attribute set sequence according to the fuzzy rough entropy.
The attributes are arranged in ascending order according to the fuzzy rough entropy of each attribute to obtain an attribute sequence as follows:
S=<c′ 1 ,c′ 2 ,…,c′ m >
wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;
to construct a sequence of attribute sets, attribute c' 1 As the first element in the sequence, then a new element is obtained by positively adding an attribute from the attribute set C until all the attributes are added to make the attribute set C as the last element of the sequence, and finally the attribute set sequence is obtained as follows:
AS=<C 1 ,C 2 ,…,C m >
wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C 1 Equal to the first attribute c 'of the attribute sequence' 1 And the last attribute subset C m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C j+1 =C j ∪{c′ j+1 }。
S6: a distance matrix and a similarity matrix of each attribute subset in the attribute set sequence are calculated.
After the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;
Figure FDA0004184597970000041
wherein Dis B Distance matrix, dis, representing attribute subset B B (u i ,u k ) Representing user u i And u j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a k (u i ) Representing user u i The value of the kth attribute in the B; the middle miningMeasuring differences between nominal attribute values by hamming distance, symbolically
Figure FDA0004184597970000042
The representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B 1 And value attribute subset B 2 It is evident that it satisfies b=b 1 ∪B 2 And->
Figure FDA0004184597970000043
User u i And u j The difference calculation formula on the attribute set B is as follows
Figure FDA0004184597970000044
S7: the fuzzy rough entropy matrix under each sub-attribute set of the two sequences is calculated separately.
The relative entropy of the fuzzy asperity is defined as follows:
Figure FDA0004184597970000045
wherein RE B (u) represents the fuzzy coarse relative entropy of user u with respect to attribute subset B;
Figure FDA0004184597970000051
u i not equal to u, expressed in fuzzy similarity relation R B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM B (u i ) Representing user u i Fuzzy information grain about attribute subset B; e (R) B ) Then the fuzzy similarity relation R is expressed B Fuzzy rough entropy of the lower user set U. RE (RE) B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E u (R B ) Compared with E (R) B ) Is of the size of (a)If the variation is large, the abnormality degree of u is higher, and the u is more likely to be an abnormal user; conversely if E u (R B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u B The lower (u) represents the higher the degree of abnormality of u;
based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:
Figure FDA0004184597970000052
Figure FDA0004184597970000053
wherein REM S And REM AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperity of (2)
Figure FDA0004184597970000054
REM AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is the fuzzy rough relative entropy->
Figure FDA0004184597970000055
S8: and calculating weight matrixes of the two sequences.
In addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:
Figure FDA0004184597970000061
Figure FDA0004184597970000062
wherein W is S And W is AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrix W S And W is AS In which |u| represents the number of users of the user set U. W (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Is used for the weight of the (c),
Figure FDA0004184597970000063
representing user u i With respect to the jth attribute c 'in the attribute sequence S' j Is a fuzzy information granule; w (W) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of->
Figure FDA0004184597970000064
Representing user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is described.
S9: and calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.
Figure FDA0004184597970000065
Wherein score (u) i ) Representing user u i An anomaly score obtained on the property set C; c represents the number of attributes in the attribute set C; REM (REM) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Relative entropy of fuzzy asperities, REM AS (i, j) is represented byHouse u i With respect to the j-th attribute subset C in the attribute set sequence AS j Is a fuzzy coarse relative entropy; w (W) S (i, j) represents user u i With respect to the jth attribute c 'in the attribute sequence S' j Weight, W of (2) AS (i, j) then represents user u i With respect to the j-th attribute subset C in the attribute set sequence AS j Weight of (2);
according to the above-described idea, the distance-based fuzzy rough anomaly score can be used as an index for determining whether a user uses electricity abnormally, i.e., the higher the anomaly score of the object u, the greater the likelihood that u is anomaly.
S10: and judging whether the anomaly score is larger than a threshold value one by one.
Given a threshold μ, for any user U ε U, if score (U) > μ, then it is determined that the user is an anomalous user in user set U.
CN202310414937.2A 2023-04-18 2023-04-18 Power consumption abnormality detection method based on fuzzy rough entropy Pending CN116433049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310414937.2A CN116433049A (en) 2023-04-18 2023-04-18 Power consumption abnormality detection method based on fuzzy rough entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310414937.2A CN116433049A (en) 2023-04-18 2023-04-18 Power consumption abnormality detection method based on fuzzy rough entropy

Publications (1)

Publication Number Publication Date
CN116433049A true CN116433049A (en) 2023-07-14

Family

ID=87088740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310414937.2A Pending CN116433049A (en) 2023-04-18 2023-04-18 Power consumption abnormality detection method based on fuzzy rough entropy

Country Status (1)

Country Link
CN (1) CN116433049A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076991A (en) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076991A (en) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment
CN117076991B (en) * 2023-10-16 2024-01-02 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment

Similar Documents

Publication Publication Date Title
CN103323749B (en) The partial discharge diagnostic method of multi-classifier information fusion
US20130138651A1 (en) System and method employing a self-organizing map load feature database to identify electric load types of different electric loads
CN107528823A (en) A kind of network anomaly detection method based on improved K Means clustering algorithms
CN109858522A (en) A kind of management line loss abnormality recognition method based on data mining
CN113962259B (en) Multi-mode double-layer fault diagnosis method for fuel cell system
CN102291392B (en) Hybrid intrusion detection method based on Bagging algorithm
CN112732748B (en) Non-invasive household appliance load identification method based on self-adaptive feature selection
CN110795690A (en) Wind power plant operation abnormal data detection method
CN116433049A (en) Power consumption abnormality detection method based on fuzzy rough entropy
CN110889441A (en) Distance and point density based substation equipment data anomaly identification method
CN115684939A (en) Battery charging abnormal state monitoring method and system based on machine learning
CN113866455A (en) Bridge acceleration monitoring data anomaly detection method, system and device based on deep learning
CN114417971A (en) Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering
CN114460481A (en) Energy storage battery thermal runaway early warning method based on Bi-LSTM and attention mechanism
CN117272204A (en) Abnormal data detection method, device, storage medium and electronic equipment
Guo et al. Data-driven anomaly detection using OCSVM with boundary optimzation
CN112949714A (en) Fault possibility estimation method based on random forest
CN116611003A (en) Transformer fault diagnosis method, device and medium
Pan et al. Study on intelligent anti–electricity stealing early-warning technology based on convolutional neural networks
Yan et al. Electricity theft identification algorithm based on auto-encoder neural network and random forest
CN115129503A (en) Equipment fault data cleaning method and system
CN114066239A (en) User power consumption abnormity detection method and device
Zhang et al. Multi-feature fusion based anomaly electro-data detection in smart grid
CN114662613A (en) Abnormal battery detection system and method based on elastic time series similarity network
CN113496440A (en) User abnormal electricity utilization detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination