CN116433049A

CN116433049A - Power consumption abnormality detection method based on fuzzy rough entropy

Info

Publication number: CN116433049A
Application number: CN202310414937.2A
Authority: CN
Inventors: 王思涵; 袁钟; 刘昶; 羊思宇; 吴衍
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-07-14

Abstract

The invention discloses a fuzzy rough entropy-based electricity utilization abnormality detection method, which relates to the technical field of electricity utilization abnormality detection, and specifically comprises the following steps: carrying out standardization processing on the numerical data; calculating a distance matrix of each attribute; calculating a fuzzy similarity matrix; calculating fuzzy rough entropy based on distance; sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence; calculating a distance matrix and a fuzzy similarity matrix of the attribute set sequence; respectively calculating fuzzy rough relative entropy matrixes of the two sequences; calculating weight matrixes of the two sequences; calculating an anomaly score for each user; judging whether the scores of the users are larger than a threshold value one by one, if so, outputting abnormal users until the judgment of all the users is completed. The fuzzy rough entropy can be used for effectively extracting fuzzy and uncertain information of abnormal power utilization data, so that the detection performance of a model is improved, and a mixed heterogeneous data set can be effectively processed.

Description

Power consumption abnormality detection method based on fuzzy rough entropy

Technical Field

The invention relates to the technical field of electricity utilization abnormality detection, in particular to an electricity utilization abnormality detection method based on fuzzy rough entropy.

Background

The clothes and food residence of people are not powered on, and the electricity safety problem of the clothes and food residence also brings critical influence to the life and national power industry of people. At present, when the power demand continuously rises, various equipment faults and electricity stealing behaviors continuously occur, and huge hidden trouble is brought to the safe operation of a power grid. Along with the popularization of intelligent electric meters in China in the whole aspect, the data volume of an electricity consumption information acquisition system is continuously increased. But due to the lack of means for intelligently analyzing data, useful information cannot be quickly extracted from the mass data. Therefore, the electricity consumption data anomaly detection technology not only can provide better guarantee for the safe operation of the power grid, but also can avoid the occurrence of larger hidden danger for the power company so as to recover huge economic loss. The traditional power consumption abnormality detection method is mainly based on a data driving method and is divided into classification, regression and clustering algorithms. The classification algorithm classifies a user set into two types of normal and abnormal according to the characteristic quantity of the user, and a model of the user set is generally constructed by relying on a labeled data set. However, these data sets require manual tag verification and therefore consume a lot of manpower; the regression algorithm is mainly used for predicting short-term electricity consumption of the user, and comparing the predicted value, the actual electricity consumption and the predicted value so as to judge whether the abnormal situation occurs. In practical application, the regression model often has the problem that the prediction accuracy is not ideal enough. In particular, since the power consumption pattern varies from person to person, a separate regression model needs to be built for each user at the time of abnormality detection. The building and storing of the model usually also takes a lot of time and overhead of memory space; the clustering method is to divide similar objects into different groups by a classification method, so that objects in the same group have certain similarity in some attributes, and have certain difference in the attributes with objects in other groups, and in the anomaly detection, a few users which do not accord with the electricity consumption behaviors of a plurality of users can be judged as abnormal electricity consumers. However, the performance of the clustering model depends on the selection of parameters to a great extent, and the model is subjected to parameter reselection during season change due to different requirements of electricity consumption rules in different seasons, so that the clustering model is complex in calculation and is not suitable for detecting a large amount of high-dimensional electricity consumption data, and has a certain limitation.

The method is mainly suitable for numerical data, and when the method is applied to a numerical attribute data set, discretization processing is needed to be carried out on the data, so that the time consumption of data preprocessing is obviously increased, and the information loss of the data is easily caused in the processing process, so that the detection performance of a model is influenced. However, the traditional power consumption anomaly detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, and the use effect is not ideal under the condition of poor model detection performance.

Disclosure of Invention

The invention aims to solve the defects that the traditional power consumption abnormality detection method is mainly based on a data driving method, a large amount of useful information is lost when mixed heterogeneous data is detected, the detection performance of a model is poor, the use effect is not ideal, and the like.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:

s1: and carrying out standardization processing on the numerical data.

Unifying the dimensions of the numerical attribute data in the data set, normalizing the original data attribute values by adopting a minimum-maximum normalization method, and changing the value range of the attributes into [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the For a pair of

The calculation formula is as follows:

wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c.

S2: a distance matrix for each attribute is calculated.

Calculating distance matrix Dis of data set under each attribute _c ，Dis _c (u _i ,u _j )＝Dis _c (u _j ,u _i ) Representing user u _i And u _j Distance on attribute c; wherein if c is a nominal attribute, a Hamming distance is used to measure the difference between the attribute values of the two users; if c is a numerical attribute, the Euclidean distance is adopted for measurement.

S3: and calculating a fuzzy similarity matrix according to the distance matrix.

After the distance matrix is calculated, the mixed fuzzy similarity matrix of the corresponding attribute is calculated according to the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;

wherein R is _B Fuzzy similarity matrix, R, representing attribute subset B _B (u _i ,u _j ) Representing user u _i And u _j Fuzzy similarity at attribute subset B; the parameter epsilon is an adjustable threshold value, and the value range is 0,1]；Dis _B (u _i ,u _j ) Representing user u _i And u _j Distance on attribute c; |b| represents the number of attributes B, i.e., |b|=1;

from fuzzy similarity relationship R _B Elicited about user u _i Is defined as:

SIM _RB (u _i )＝SIM _B (u _i )＝{R _B (u _i ,u ₁ ),R _B (u _i ,u ₂ ),…,R _B (u _i ,u _n )}

for any integer j,0 < j.ltoreq.m, the j-th element R in the grain _B (u _i ,u _j ) Represented as u _i And u _j Fuzzy similarity with respect to B; can be analyzed to find that if R _B (u _i ,u _j ) =0, description u _j Must not belong to u _i Is of fuzzy information granule SIM _RB (u _i ) The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if R _B (u _i ,u _j ) =1, then specify u _j Must belong to SIM _RB (u _i )。

S4: and calculating fuzzy rough entropy based on the distance.

Calculating fuzzy rough entropy under each attribute according to the calculated fuzzy similarity; the fuzzy rough entropy based on distance is defined as follows:

wherein E (B) =e (R _B ) Representing fuzzy similarity relationship R _B Is used for the fuzzy rough entropy; SIM (subscriber identity Module) _RB (u _i ) Representing user u _i In fuzzy similarity relation R _B Is used for the fuzzy information granule of (a),

the |u| represents the number of users in the user set U.

S5: and obtaining an attribute sequence and an attribute set sequence according to the fuzzy rough entropy.

The attributes are arranged in ascending order according to the fuzzy rough entropy of each attribute to obtain an attribute sequence as follows:

S＝<c′ ₁ ,c′ ₂ ,…,c′ _m >

wherein S represents an attribute sequence, and m represents the number of attributes of the attribute set C; s, the fuzzy rough entropy of any attribute is smaller than or equal to the fuzzy rough entropy of the next attribute;

to construct a sequence of attribute sets, attribute c' ₁ As the first element in the sequence, then adding an attribute from the attribute set C forward to obtain a new element until all the attributes are added to make the attribute set C as the last element of the sequence, finally obtaining the genusThe sex set sequence is as follows:

AS＝<C ₁ ,C ₂ ,…,C _m >

wherein AS represents an attribute set sequence, and m represents the number of attributes of an attribute set C; any attribute subset in AS is contained in attribute set C, and the first attribute subset C ₁ Equal to the first attribute c 'of the attribute sequence' ₁ And the last attribute subset C _m Is equal to the property set C itself; for any integer j,0 < j < m, the j+1th attribute subset is equal to the j attribute subset and is followed by the j+1th attribute of the attribute sequence, i.e., C _j+1 ＝C _j ∪{c′ _j+1 }。

S6: a distance matrix, a similarity matrix, is calculated for each attribute subset of the sequence of attribute sets.

After the attribute set sequence is obtained, calculating a distance matrix and a similarity matrix under each attribute subset of the attribute set sequence sequentially according to the calculation modes of the steps S2 to S4; because the number of the attributes in the attribute subset is more than or equal to 1, the distance matrix is calculated specifically by the following calculation formula;

wherein Dis _B Distance matrix, dis, representing attribute subset B _B (u _i ,u _k ) Representing user u _i And u _j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a _k (u _i ) Representing user u _i And (3) taking the value on the kth attribute in the B. The difference between nominal attribute values is measured by using Hamming distance and is symbolized

The representation is calculated by hamming distance; the difference between the numerical attribute values is measured by Euclidean distance; whereas for the calculation of user differences over the mixed set of attributes B, i.e. B contains both nominal attributes and nominal attributes, it is necessary to divide B into nominal attribute subsets B ₁ And value attribute subset B ₂ It is evident that it satisfies b=b ₁ ∪B ₂ And->

User u _i And u _j The difference calculation formula on the attribute set B is Dis _B (u _i ,u _j )＝Dis _B1 (u _i ,u _j )+Dis _B2 (u _i ,u _j )。

S7: the fuzzy coarse relative entropy matrix under each sub-attribute set of the two sequences is calculated respectively.

The relative entropy of the fuzzy asperity is defined as follows:

wherein RE _B (u) represents the fuzzy coarse relative entropy of user u with respect to attribute subset B;

u _i not equal to u, expressed in fuzzy similarity relation R _B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM _B (u _i ) Representing user u _i Fuzzy information grain about attribute subset B; e (R) _B ) Then the fuzzy similarity relation R is expressed _B Fuzzy rough entropy of the lower user set U. RE (RE) _B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E _u (R _B ) Compared with E (R) _B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E _u (R _B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u _B The lower (u) represents the higher the degree of abnormality of u;

based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute or attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:

wherein REM _S And REM _AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Relative entropy of fuzzy asperity of (2)

REM _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Is the fuzzy rough relative entropy->

S8: and calculating weight matrixes of the two sequences.

In addition to utilizing uncertain measures such as fuzzy rough entropy, the definition of the weight function can also effectively help to enlarge the difference between abnormal users and normal users in abnormal scores; therefore, after calculating the relative entropy of the fuzzy rough, calculating the weight of each subset of the two sequences to respectively form a weight matrix of the corresponding sequence as follows:

wherein W is _S And W is _AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrixW _S And W is _AS In which |u| represents the number of users of the user set U. W (W) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Is used for the weight of the (c),

representing user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Is a fuzzy information granule; w (W) _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Weight of->

Representing user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Is described.

S9: and calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.

Wherein score (u) _i ) Representing user u _i An anomaly score obtained on the property set C; the number of attributes in the attribute set C is represented by C; REM (REM) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Relative entropy of fuzzy asperities, REM _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Is a fuzzy coarse relative entropy; w (W) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Weight, W of (2) _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Is a weight of (a).

S10: and judging whether the anomaly score is larger than a threshold value one by one.

Given a threshold μ, for any user U ε U, if score (U) > μ, then it is determined that the user is an anomalous user in user set U.

In the invention, the fuzzy rough entropy is constructed by utilizing a fuzzy similarity relation based on distance measurement, the degree of abnormality of a user is analyzed by utilizing fuzzy rough relative entropy, and more useful information is obtained by constructing an attribute sequence and an attribute set sequence;

according to the electricity consumption abnormality detection method based on the fuzzy rough entropy, the problem that the similarity cannot be effectively distinguished under the condition of high dimensionality due to the adoption of the cross operation in the existing fuzzy rough entropy is solved, a model is not required to be trained in advance, the workload of operators is reduced, the problem of abnormality of electricity consumption data is found early, and the detection efficiency is improved;

the invention can effectively extract the fuzzy and uncertain information of the abnormal electricity consumption data by utilizing the improved fuzzy rough entropy, thereby improving the detection performance of the model, expanding the traditional fuzzy rough set method, effectively processing the mixed heterogeneous data set, being not influenced by the type of the data set and being applicable to various types of electricity consumption data.

Drawings

Fig. 1 is a schematic flow chart of steps of an electricity utilization abnormality detection method based on fuzzy rough entropy.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

Example 1

Referring to fig. 1, an embodiment provided by the present scheme: an electricity utilization abnormality detection method based on fuzzy rough entropy comprises the following steps:

s1: acquiring electricity utilization data, and carrying out standardization processing on a numerical object in the electricity utilization data to obtain an information system;

s2: calculating a distance matrix under each attribute according to the information system;

s3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix;

s4: calculating a fuzzy rough entropy based on the distance of each attribute;

s5: sorting the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence;

s6: calculating a distance matrix and a fuzzy similarity matrix of each attribute subset of the attribute set sequence;

s7: respectively calculating fuzzy rough relative entropy matrixes under each sub-attribute set of the two sequences;

s8: calculating weight matrixes of the two sequences;

s9: calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix;

s10: judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.

In the practice of fuzzy asperities, electricity usage data is imported into an information system (or information table), where each row represents a user and each column represents an attribute of an object; the value of the attribute may include a numerical value (such as electricity consumption, voltage, etc.), a nominal value (such as user type, connection mode), a hybrid value (including a numerical value and a nominal value), etc.; an information system is denoted (U, C), where U represents a set of users and C represents a set of attributes;

in the embodiment, firstly, a minimum-maximum standardized method is utilized to preprocess numerical value attributes to obtain an electricity data set, the advantage of the improved fuzzy rough entropy energy to effectively reflect the distinguishing degree of the electricity data is utilized to reconstruct an attribute sequence and an attribute set sequence, and the abnormal degree of a user is evaluated by constructing an abnormal score by utilizing fuzzy rough relative entropy and a weight matrix corresponding to the two sequences, so that the problem that how to process complex electricity data with the characteristics of multi-source isomerism, fuzzy, uncertainty and the like cannot be effectively solved by the existing electricity abnormality detection method is solved; the method can perform model training without marking data, and can effectively realize the unsupervised anomaly detection of the hybrid power consumption data.

In the step S1, the step of,

the expression of the normalization processing is as follows:

wherein f is normalized; c (u) represents the value of user u on attribute c; min (c) represents the minimum of all users on attribute c; max (c) represents the maximum value of all users on attribute c;

in this embodiment, the value range of the numerical data is adjusted to the real number range of 0-1 by the min-max normalization operation, and the nominal data is kept unchanged.

S2 specifically comprises the following steps:

the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c _c The method comprises the steps of carrying out a first treatment on the surface of the Wherein Dis _c (u _i ,u _j )＝Dis _c (u _j ,u _i ) Representing user u _i And u _j Distance on attribute c.

S3 specifically comprises the following steps:

after the distance matrix is calculated, calculating a mixed fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C; when the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users; and can be analytically found that if R _B (u _i ,u _j ) =0, description u _j Must not belong to u _i Is of (1) fuzzy information particles

Conversely, if R _B (u _i ,u _j ) =1, then specify u _j Must belong to

s4 specifically comprises the following steps:

calculating a fuzzy rough entropy based on the distance of each attribute;

wherein the SIM is _Rc (u _i )＝SIM _c (u _i )＝{R _c (u _i ,u ₁ ),R _c (u _i ,u ₂ ),…,R _c (u _i ,u _n )}，

Representing user u _i With respect to R _c Is a fuzzy information granule; for any integer j,0 < j.ltoreq.m, the j-th element R in the grain _c (u _i ,u _j ) Represented as u _i And u _j Fuzzy similarity with respect to c; furthermore, the->

The |u| represents the number of users of the user set U.

S5 specifically comprises the following steps:

the attributes are arranged in ascending order according to the size of the fuzzy rough entropy, and an attribute sequence is obtained firstly as follows:

S＝<c′ ₁ ,c′ ₂ ,…,c′ _m >

to construct a sequence of attribute sets, attribute c' ₁ As the first element in the sequence, then a new element is obtained by positively adding an attribute from the attribute set C until all the attributes are added to make the attribute set C as the last element of the sequence, and finally the attribute set sequence is obtained as follows:

AS＝<C ₁ ,C ₂ ,…,C _m >

S6 is specifically as follows:

User u _i And u _j The difference calculation formula on the attribute set B is as follows

S7 specifically comprises the following steps:

the relative entropy of the fuzzy asperity is defined as follows:

wherein RE _B (u) represents the fuzzy coarse relative entropy of user u,

u _i not equal to u, expressed in fuzzy similarity relation R _B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM _B (u _i ) Representing user u _i Fuzzy information grain about attribute subset B; e (R) _B ) Then the fuzzy similarity relation R is expressed _B Fuzzy rough entropy of the lower user set U. RE (RE) _B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E _u (R _B ) Compared with E (R) _B ) When the size of the U is greatly changed, the abnormality degree of the U is higher, and the U is more likely to be an abnormal user; conversely if E _u (R _B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the first and second substrates are bonded together,u relative fuzzy coarse entropy RE _B The lower (u) represents the higher the degree of abnormality of u;

REM _AS (i,j)＝RE _Cj (u _i )；

wherein REM _S And REM _AS Respectively representing a fuzzy rough relative entropy matrix of the attribute sequence and a fuzzy rough relative entropy matrix of the attribute set sequence; REM (REM) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Is the fuzzy coarse relative entropy RE _c′j (u _i )；REM _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Relative entropy of fuzzy asperity of (2)

S8 specifically comprises the following steps:

wherein W is _S And W is _AS Respectively representing a weight matrix of the attribute sequence and a weight matrix of the attribute set sequence; in the weight matrix W _S And W is _AS In which |u| represents the number of users of the user set U. W (W) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Is used for the weight of the (c),

S9 is specifically as follows:

evaluating an outlier score for each user using the distance-based outlier score;

S10 specifically comprises the following steps:

judging whether the outlier degree of the users in the information system is larger than a threshold value one by one, if so, outputting abnormal users, otherwise, judging the next user until the judgment of all the users is completed.

Example two

Referring to Table I, c ₁ Is a nominal attribute, c ₂ 、c ₃ Is a numerical attribute, x ₁ ,...,x ₆ Representing all users in the user set U. An embodiment provided by the scheme is as follows: a power consumption abnormality detection method based on fuzzy rough entropy comprises the following steps.

Table-electricity consumption meter

S1: and (3) carrying out standardization processing on the numerical data: using a min-max normalized formula pair c ₂ 、c ₃ The attribute values of (2) are preprocessed, and the preprocessing results are shown in the right part of table 1.

S2: calculating a distance matrix under each attribute according to the first table; the distance measurement between two users on the nominal attribute uses Hamming distance, while the distance between two users on the numerical attribute uses Euclidean distance to obtain the distance matrix Dis under each attribute c _c 。

S3: calculating a fuzzy similarity matrix under each attribute according to the distance matrix; wherein the value of the parameter epsilon is 0.5.

S4: a distance-based fuzzy rough entropy of each attribute is calculated.

E(c ₁ )＝1.1258，E(c ₂ )＝1.7088，E(c ₃ )＝1.5457

S5: and (5) carrying out ascending arrangement on the attributes according to the size of the fuzzy rough entropy to obtain an attribute sequence and an attribute set sequence.

S＝<c ₁ ,c ₃ ,c ₂ >

AS＝＜C ₁ ,C ₂ ,C ₃ ＞＝＜{c ₁ },{c ₁ ,c ₃ },{c ₁ ,c ₂ ,c ₃ }＞

S6: a distance matrix is calculated for each attribute subset of the sequence of attribute sets.

/>

And obtaining a fuzzy similarity matrix of each sub-attribute set in the attribute set sequence according to the distance matrix.

S8: and calculating weight matrixes of the two sequences.

S9, calculating the anomaly score of each user according to the relative entropy matrix and the weight matrix.

The same principle can be obtained:

score(x ₂ )≈0.3722，score(x ₃ )≈0.3825，

score(x ₄ )≈0.3753，score(x ₅ )≈0.3901，score(x ₆ )≈0.3645。

s10: let μ=0.38, determine one by one if the user's outlier in the information system is largeAnd outputting abnormal users if the user is in the threshold value, otherwise, judging the next user until the judgment of all the users is completed. Finally, obtaining a user set larger than a threshold mu, namely the calculated outlier set OS= { u ₁ ,u ₃ }。

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The power consumption abnormality detection method based on fuzzy rough entropy is characterized by comprising the following steps of:

s1: and carrying out standardization processing on the numerical data.

The calculation formula is as follows:

S2: a distance matrix of a single attribute is calculated.

S3: and calculating a fuzzy similarity matrix of the single attribute according to the distance matrix.

After the distance matrix is calculated, calculating a fuzzy similarity matrix of the corresponding attribute through the following calculation formula, wherein B represents any attribute of the attribute set C. When the distance is smaller than a certain threshold value, a certain similarity exists between the two users under the attribute, otherwise, no similarity exists between the two users;

from fuzzy similarity relationship R _B Elicited about user u _i Is defined as:

for any integer j,0 < j.ltoreq.m, the j-th element R in the grain _B (u _i ,u _j ) Represented as u _i And u _j Fuzzy similarity with respect to B; can be analyzed to find that if R _B (u _i ,u _j ) =0, description u _j Must not belong to u _i Is of (1) fuzzy information particles

Conversely, if R _B (u _i ,u _j ) =1, then specify u _j Must belong to->

S4: and calculating fuzzy rough entropy based on the distance.

wherein E (B) =e (R _B ) Representing fuzzy similarity relationship R _B Is used for the fuzzy rough entropy;

representing user u _i In fuzzy similarity relation R _B Is (are) fuzzy information particles, ">

The |u| represents the number of users in the user set U.

S＝<c′ ₁ ,c′ ₂ ,…,c′ _m >

AS＝<C ₁ ,C ₂ ,…,C _m >

S6: a distance matrix and a similarity matrix of each attribute subset in the attribute set sequence are calculated.

wherein Dis _B Distance matrix, dis, representing attribute subset B _B (u _i ,u _k ) Representing user u _i And u _j Differences in the values of attribute subset B; the |b| represents the number of attributes in the attribute subset B, a _k (u _i ) Representing user u _i The value of the kth attribute in the B; the middle miningMeasuring differences between nominal attribute values by hamming distance, symbolically

S7: the fuzzy rough entropy matrix under each sub-attribute set of the two sequences is calculated separately.

The relative entropy of the fuzzy asperity is defined as follows:

u _i not equal to u, expressed in fuzzy similarity relation R _B Fuzzy rough entropy of the lower user set U after U is removed, wherein U represents the number of users of the user set U and SIM _B (u _i ) Representing user u _i Fuzzy information grain about attribute subset B; e (R) _B ) Then the fuzzy similarity relation R is expressed _B Fuzzy rough entropy of the lower user set U. RE (RE) _B (u) the degree of abnormality of u can be analyzed; when U is removed from user set U, if E _u (R _B ) Compared with E (R) _B ) Is of the size of (a)If the variation is large, the abnormality degree of u is higher, and the u is more likely to be an abnormal user; conversely if E _u (R _B ) The change is very small, so that the abnormality degree of u is very low, and the smaller probability is an abnormal user; thus, the relative fuzzy coarse entropy RE of u _B The lower (u) represents the higher the degree of abnormality of u;

based on the thought, the fuzzy rough relative entropy of one user in U is removed each time under each attribute subset in the two sequences is calculated in sequence, so that two fuzzy rough relative entropy matrixes are formed as follows:

S8: and calculating weight matrixes of the two sequences.

Wherein score (u) _i ) Representing user u _i An anomaly score obtained on the property set C; c represents the number of attributes in the attribute set C; REM (REM) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Relative entropy of fuzzy asperities, REM _AS (i, j) is represented byHouse u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Is a fuzzy coarse relative entropy; w (W) _S (i, j) represents user u _i With respect to the jth attribute c 'in the attribute sequence S' _j Weight, W of (2) _AS (i, j) then represents user u _i With respect to the j-th attribute subset C in the attribute set sequence AS _j Weight of (2);

according to the above-described idea, the distance-based fuzzy rough anomaly score can be used as an index for determining whether a user uses electricity abnormally, i.e., the higher the anomaly score of the object u, the greater the likelihood that u is anomaly.