CN116796271A - Resident energy abnormality identification method - Google Patents

Resident energy abnormality identification method Download PDF

Info

Publication number
CN116796271A
CN116796271A CN202310727271.6A CN202310727271A CN116796271A CN 116796271 A CN116796271 A CN 116796271A CN 202310727271 A CN202310727271 A CN 202310727271A CN 116796271 A CN116796271 A CN 116796271A
Authority
CN
China
Prior art keywords
data
energy
abnormal
user
resident
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310727271.6A
Other languages
Chinese (zh)
Inventor
梁志远
张璐明
贺小刚
常迪
王堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Sanyuan Electric Information Technology Co ltd
Original Assignee
Tianjin Sanyuan Electric Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Sanyuan Electric Information Technology Co ltd filed Critical Tianjin Sanyuan Electric Information Technology Co ltd
Priority to CN202310727271.6A priority Critical patent/CN116796271A/en
Publication of CN116796271A publication Critical patent/CN116796271A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention provides a resident energy anomaly identification method, which comprises the following steps: s1, analyzing abnormal energy consumption of residents according to historical abnormal energy consumption data; s2, identifying abnormal energy data by utilizing a domestic energy abnormality algorithm; s3, detecting abnormal energy consumption of residents. According to the invention, through analyzing the resident energy anomaly analysis process, k-Means clustering algorithm research is respectively carried out aiming at resident energy anomaly, and the resident energy anomaly is accurately identified, so that the management loopholes such as resident energy potential safety hazards and energy waste are eliminated by assistance.

Description

Resident energy abnormality identification method
Technical Field
The invention belongs to the field of energy data, and particularly relates to a resident energy anomaly identification method.
Background
With the continuous improvement of the living standard of people, the total energy consumption of the living of residents shows a more obvious rising trend. Especially in the last ten years, the speed increase is obviously accelerated, the total energy demand is obviously increased, but the residential energy structure still has larger coal dependence, the clean energy utilization rate is insufficient, the residential energy structure has a great improvement space, and in addition, the monitoring of the resident abnormal energy is lack of powerful supervision tools and means, the abnormal energy cannot be found in time, and further the energy waste and even the energy safety accident are caused, so the method for identifying the residential energy abnormality is particularly urgent and important.
Disclosure of Invention
In view of the above, the present invention aims to propose a resident energy anomaly identification method to solve at least one problem in the background art.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a resident energy abnormality identification method comprises the following steps:
s1, analyzing abnormal energy consumption of residents according to historical abnormal energy consumption data;
s2, identifying abnormal energy data by utilizing a domestic energy abnormality algorithm;
s3, detecting abnormal energy consumption of residents.
Further, in step S1, the specific method is as follows:
a1, extracting characteristics in abnormal energy data to form an abnormal energy characteristic library;
a2, establishing and optimizing a standard feature library;
a3, identifying an abnormal energy utilization curve;
and A4, statistically analyzing abnormal energy utilization events.
Further, in step A1, the specific method is as follows:
and extracting the characteristics in the abnormal energy data according to an artificial intelligence algorithm, establishing a corresponding model, repairing test sample data for the abnormal characteristic model, and accumulating to form a resident abnormal energy characteristic library.
Further, in step A2, the specific method is as follows: and analyzing standard characteristics of data curve change according to the historical normal data, and establishing and optimizing a standard characteristic library.
Further, in step A3, the specific method is as follows: and according to the standard feature library rule, identifying a non-standard curve in the historical data at regular intervals, judging the non-standard curve as abnormal, and intercepting the abnormal curve as resident abnormal energy data.
Further, in step A4, the specific method is as follows: and identifying abnormal reasons according to abnormal energy data, integrating abnormal energy event information, and carrying out statistical analysis on the abnormal energy events to form abnormal energy condition display, abnormal energy client ranking, abnormal energy industry classification ranking, abnormal similarity analysis and self-learning trend analysis under each classification.
Further, in step S2, the specific method is as follows:
b1, clustering historical daily load data of all users by using a K-Means algorithm, determining the electricity consumption behaviors of the users, and giving out clustering labels of each electricity consumption behavior;
and B2, taking the historical daily load data of the user as input, taking the electricity behavior label as output, and establishing a daily load classification model.
Further, in step S2, the method specifically includes the following steps:
in the daily operation stage, daily loads of all users are classified, and the energy utilization behaviors of the same type of users are compared according to the transverse scoring standard of resident users to give transverse scores;
The number of clusters in the K-Means is set by people, and the value of the number of clusters K is judged by using the Elbowmethod, silhouette Coefficient and Calinski-Har-abaz Index;
clustering performance is measured through distortion degree by using an Elbow Method;
silhouette Coefficient is set as an evaluation Index of the degree of density and dispersion of the class, when the value is [ -1,1], the closer 1 represents the more reasonable the K value, the Calinski-Harabaz Index is defined as the ratio of the inter-group discrete to the intra-group discrete, and the larger the score is, the better the clustering effect is.
Further, in step S2, the method includes:
definition 1: data sequence A 1 =<a 11 ,a 12 …a 1L >If a 1i I=1, 2 … L is the electric energy consumption value used by the user in the period from the i-1 th moment to the i-th moment, observed at the i-th moment in a time window, then a 1 =<a 11 ,a 12 …a 1L >Is a user electricity and power data sequence, L is the length of the time window of the sequence;
definition 2: data sequence A * 1 =<a * 11 ,a * 12 …a * 1L >Is given user electricity consumption electric energy data sequence, r 1 >0, if there is a user electricity consumption electric energy data sequence A 1 =<a 11 ,a 12 …a 1L >Satisfy A * 1 -A 1 ||<r 1 Then call A * 1 Is a user power consumption electric energy data sequence mode, r 1 Is the mode radius of the mode.
Further, in step S3, the specific method is as follows:
c1, establishing an electricity behavior anomaly detection model;
And C2, perfecting and supplementing an electricity consumption behavior abnormality detection model:
c3, automatically detecting abnormal electricity consumption behaviors;
c4, carrying out profile introduction and balance pretreatment on the data set;
c5, processing unbalanced data based on a Border-SMOTE algorithm;
and C6, detecting, testing and analyzing the abnormal electricity utilization behaviors of residents.
Compared with the prior art, the resident energy anomaly identification method has the following beneficial effects:
according to the resident energy anomaly identification method, the resident energy anomaly analysis process is analyzed, k-Means clustering algorithm research is respectively carried out aiming at resident energy anomalies, and the resident energy anomalies are accurately identified, so that the management loopholes such as potential safety hazards of resident energy and energy waste are eliminated in a boosting manner.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of a resident energy anomaly algorithm identification process according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a resident energy anomaly analysis process according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the class of electrical energy data from user electrical energy data to user electrical energy data according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an effective form of electric energy data of a microwave oven according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a profile coefficient (average value, l=3) of a clustering analysis result of electric energy data of microwave oven according to an embodiment of the present invention;
fig. 6 is a schematic diagram of SSE values (average value, l=3) of a clustering analysis result of electric energy data of microwave oven according to an embodiment of the present invention;
fig. 7 is a schematic diagram of distribution (average value, l=3, k=8) of the profile coefficients of the clustering analysis result of the electric energy data of the microwave oven according to the embodiment of the invention;
fig. 8 is a schematic diagram of distribution (average value, l=3, k=8) of SSE values of clustering analysis results of electric energy data of microwave oven according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a distribution (average value, l=3, k=8) of using time for clustering analysis of electric energy data of microwave oven according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a recognition result (l=3, k=8) of abnormal use behavior of a microwave oven according to an embodiment of the present invention;
fig. 11 is a schematic diagram of profile coefficients (l=3, k=8) of a sequence pattern set during abnormal use behavior recognition of a microwave oven according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of a normal customer electricity consumption curve according to an embodiment of the present invention;
FIG. 13 is a schematic diagram of abnormal customer electricity behavior according to an embodiment of the present invention;
FIG. 14 is a schematic diagram illustrating an exemplary SMOTE algorithm according to an embodiment of the present invention;
FIG. 15 is a schematic diagram illustrating an example of a Border-SMOTE algorithm according to an embodiment of the present invention;
FIG. 16 is a schematic diagram of the predicted classification of raw data according to an embodiment of the present invention;
FIG. 17 is a schematic diagram of a SMOTE oversampling prediction class in accordance with an embodiment of the present invention;
FIG. 18 is a schematic diagram of a class of prediction for Border-SMOTE oversampling in accordance with an embodiment of the present invention;
FIG. 19 is a graph showing comparative analysis of classification indicators of different models according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
The scheme is researched around the resident energy analysis process and resident energy anomaly algorithm identification, and the general technical route is shown in figure 1;
(1) Abnormal analysis process of household energy
1) Forming a abnormal energy feature library: and extracting the characteristics in the abnormal energy data according to an artificial intelligence algorithm, establishing a corresponding model, repairing test sample data for the abnormal characteristic model, and accumulating to form a resident abnormal energy characteristic library.
2) Establishing and optimizing a standard feature library: and analyzing standard characteristics of data curve change according to the historical normal data, and establishing and optimizing a standard characteristic library.
3) Identifying an abnormal energy usage curve: and according to the standard feature library rule, identifying a non-standard curve in the historical data at regular intervals, judging the non-standard curve as abnormal, and intercepting the abnormal curve as resident abnormal energy data.
4) Statistical analysis of abnormal energy events: identifying abnormal reasons according to abnormal energy data, integrating abnormal energy event information, and carrying out statistical analysis on the abnormal energy events to form abnormal energy situation display, abnormal energy client ranking, abnormal energy industry classification ranking, abnormal similarity analysis, self-learning trend analysis and the like under each classification.
(2) Household energy anomaly algorithm identification
The K-means clustering algorithm is a dynamic clustering algorithm and is also one of clustering algorithms based on division, and is independently proposed by Lloyd, mcQueen, steinhaus and Ball & Hall in different scientific research fields respectively during the period of 50 to 70 years of the 20 th century. K-means clustering algorithm. The sum of squares of distances from all samples in the clustering domain to all clustering centers is minimized as an evaluation index of the clustering effect. The algorithm principle is as follows: firstly, k sample points are selected as initial distance centers of clustering, namely clustering center points, then, the distance between each sample point and the k clustering centers, such as Euclidean distance, is calculated, the minimum distance value is found, the sample is classified into the nearest clustering center, the value of the clustering center point is modified to be the average value of all samples in the class, namely the class centroid, the distance between each sample and the new k center points is recalculated, the new center points are reclassified and adjusted, and the steps are repeated continuously until the positions of the k center points are unchanged, and the clustering process is ended. In the k-Means clustering algorithm, a value of a normal energy data set of residents containing N N-dimensional data points is given in advance, and finally whether the resident energy is abnormal or not is judged by judging the direct difference between an expected target and a set value.
(3) Household energy anomaly detection
The dual-channel 1DCNN-AttBILSTM network is used for exploring illegal behaviors in the customer electricity data. Meanwhile, since the abnormal electricity usage samples belong to unbalanced data, it is considered to process this type of data using a Border-SMOTE oversampling technique. The method is characterized in that an experiment is performed on a real customer electricity consumption data set published by a China national electric network, and results are comprehensively evaluated by comparing with other machine learning and deep learning methods, so that the feasibility and the effectiveness of the dual-channel 1DCNN-AttBILSTM network in the field of electricity consumption behavior anomaly detection are verified.
The specific method comprises the following steps:
1. abnormal analysis process of household energy
The residential energy anomaly analysis specifically comprises the steps of extracting characteristics in abnormal energy data according to an artificial intelligence algorithm; analyzing standard characteristics of data curve change according to historical normal data, and establishing and optimizing a standard characteristic library; according to the standard feature library rule, a non-standard curve is regularly identified in the historical data, and an abnormal curve is intercepted to be used as resident abnormal energy data; and identifying the abnormal reason according to the abnormal energy data, integrating abnormal energy event information, and carrying out statistical analysis on the abnormal energy event to form abnormal energy classification under each classification.
Taking a resident of a certain community as an example, the platform deduces that the abnormal energy consumption of the user is "industrial change" through the following characteristics.
Feature 1: the system monitors that the active curved surface of the user changes from a peak Gu Quxian to a stable curve and the power consumption characteristics change in the recent day compared with the history.
Feature 2: the system monitors that the water consumption is obviously changed, and the index deviation of electricity, water, gas and the like is cross-verified by using an artificial intelligence algorithm, so that the number of residents of the household is changed.
Feature 3: the customer label shows that the recent electric charge rising amplitude of the user exceeds 50%, and the support vector machine is used for deducing that the electric load of the user is changed.
Through field verification, the recent population of the resident is suddenly increased, a plurality of high-load electric equipment is added, and the actual situation is matched with the prejudgment reason. In addition, the abnormal energy intelligent analysis of a single user can be expanded to the whole industry and the whole area by points and planes, so that the collection of the industry change information of the government is more direct, more timely and more perfect, and the urban planning layout of the government has data, arguments and basis.
2. Household energy anomaly algorithm
The scheme is based on resident user electricity data to conduct research. Because the electric energy data used by the user is generated due to the electric energy behavior of the user, the electric energy behavior of the user can be identified and screened by analyzing the electric energy data used by the user. Because the electric energy data used by the user is generally collected for charging electricity, the collection period is generally longer, and at the moment, the identification and discrimination of the related event of electricity consumption under the large-scale time are carried out based on the electric energy data used by the user; if the electric energy data used by the user can be collected at a higher frequency, namely in a shorter period, the identification and discrimination of the fine-grained user electric energy related events can be performed based on the high-frequency user electric energy data.
The electric energy data used by the user can be obtained at a designated moment whether the manual meter reading mode or the automatic meter reading mode is used. In a period of time, the electric energy meter reading data of the electricity consumption of one user at each appointed moment form a user electricity consumption electric energy data sequence, and the time span of the user electricity consumption electric energy data sequence forms a time window of the electric energy data sequence. The time window is excessively large, information contained in the electric energy data sequence can be intersected, specific electricity utilization behaviors of users are difficult to accurately identify, and clustering effect is poor; however, the time window is too small, so that the electricity consumption of the user is scattered and trivial, and therefore, a suitable time window length value is determined according to the actual situation.
The k-means clustering algorithm extracts patterns of the user electricity data sequences from the user electricity data sequence set, and based on the obtained patterns of the user electricity data sequences, whether the newly generated user electricity data sequences are similar to the patterns of a known user electricity data sequence or not can be judged: if the newly generated user power usage electrical energy data sequence is not similar to the pattern of any user power usage electrical energy data sequence, then the new user power usage electrical energy data sequence occurs, and such newly generated user power usage electrical energy data sequence is generally caused by new user power usage behavior. According to the abnormality of the electricity consumption behavior of the user, the data of the electricity consumption electric energy data sequence of the user can be simply divided into two types of normal and abnormal, and correspondingly, the modes of the electricity consumption electric energy data sequence of the user can be divided into two types of normal modes and abnormal modes. Based on the full acquisition of the user electricity consumption electric energy data sequence mode, whether the newly acquired user electricity consumption electric energy data sequence is normal or not can be identified by comparing the user electricity consumption electric energy data sequence with the user electricity consumption electric energy data sequence mode, and even the specific normal/abnormal mode to which the newly acquired user electricity consumption electric energy data sequence belongs can be identified.
And clustering all the historical daily load data of the users by using a K-Means algorithm, determining the electricity utilization behaviors of the users, and giving out clustering labels of each electricity utilization behavior. Secondly, the historical daily load data of the user is used as input, the electricity behavior label is used as output, and a daily load classification model is built.
In the daily operation stage, daily loads of all users are classified, and transverse scoring is given according to transverse scoring standards of resident users and by comparing the energy utilization behaviors of the same type of users. The number of clusters in K-Means was set by human, and the number of clusters K was determined by Means of the Elbowmethod, silhouette Coefficient and Calinski-Har-abaz Index. The Elbow Method measures clustering performance by the degree of distortion. For data with a certain degree of distinction, the distortion degree is greatly improved when a certain critical point is reached, and then the data slowly descends, and the point is the optimal number. Silhouette Coefficient is an evaluation index of the degree of density and dispersion of the class, the value is [ -1,1], and the closer 1 represents the more reasonable the K value. Calinski-Harabaz Index is defined as the ratio of inter-group discrete to intra-group discrete, with a larger score indicating better clustering.
Definition 1: for the data sequence a1= < a11, a12 … A1L >, if a1i, i=1, 2 … L is the power consumption value of the user used in the period from the i-1 th time to the i-th time observed at the i-th time in a time window, a1= < a11, a12 … A1L > is a user power consumption data sequence, and L is the length of the time window of the sequence.
Generally, for a resident user with a stable and regular life state, the electricity consumption behavior of the resident user is determined within a period of time, which means that if a time window is determined with a certain length, on the premise of considering the influence of factors such as data acquisition errors, working conditions of electric equipment and the like, the electricity consumption of the resident user shows regularity, the regularity also enables the resident user to show a clustered distribution state in the electricity consumption space of the resident user in the user electricity consumption data sequence within a longer time span, and the characteristic of the clustered distribution can be obtained by using a clustering analysis method for all the resident user electricity consumption data sequences of the resident user within the longer time span through a data mining technology on the basis of defining the space structure of the resident user electricity consumption.
Definition 2: the data sequence A1 = < a11, a12 … A1L > is a given user power data sequence, r1>0, if there is a user power data sequence a1= < a11, a12 … A1L > satisfies A1-A1 < r1, where A1 is a user power data sequence pattern and r1 is a pattern radius of the pattern. Obviously, a user power data sequence pattern is also a user power data sequence in nature. Intuitively, a user power data sequence pattern defines a supersphere in a user power consumption space centered on the pattern and having a radius of the pattern as a radius, the supersphere being a geometric form of clustered distribution of the user power data sequence. The user electricity consumption power data sequence is a direct description of user electricity consumption related events from a power consumption perspective. If a consumer power related event is considered a data class of consumer power data, the data class may be described jointly using several consumer power data sequence patterns. Therefore, when the user electric behavior is screened based on the user electric energy data of the user, the user electric energy data sequence patterns in the user electric energy data of the user are required to be effectively acquired first, and the pattern radius of each user electric energy data sequence pattern is given. The algorithm 1 gives a user electricity and power sequence flow for acquiring a user electricity and power data sequence mode and a mode radius by using a k-means clustering method. In algorithm 1, each user power data sequence pattern is collectively characterized by the center sequence of the pattern and its pattern radius.
Algorithm 1: user electricity consumption electric energy data sequence mode acquisition based on k-means clustering
Input: all users use the electric energy data A; the length L of a user electricity and power data sequence time window;
a pattern number k;
and (3) outputting: all users power consumption electric energy data sequence mode set PS
1) Calculating the number of data elements in A, and storing the number into a variable m;
2)n=m-L+1;
3)pData=zeros(L,n);
4)for i=1to n step by 1
5)pData(:,i)=A(i:i+L-1);
6) Clustering of data in pData using k-means clustering algorithm
7) Setting PS to null set;
8) for each clustering
9) Calculating a clustering center which is a user electricity and power data sequence mode A
10 Calculating the distance from each data in the cluster to the cluster center, and selecting the large distance as the corresponding distance
Pattern radius for user power data sequence pattern
11 The obtained user electricity and power data sequence mode and mode radius are taken as an element to be added into PS
Steps 1-5 of algorithm 1 are used for data preparation: giving all user electricity consumption electric energy data to be stored in an array A according to the collected sequence, and considering that one user electricity consumption electric energy data sequence is that user electricity consumption electric energy data collected from one time window are arranged according to the time sequence, so that all user electricity consumption electric energy data sequences for cluster analysis are required to be constructed according to the length L of the time window of the sequence and the array A of all user electricity consumption electric energy data, wherein m is the number of elements in the array A; all constructed household electricity and energy data sequences are stored in a variable pData; pData is a two-dimensional array that holds 1 sequence of user electrical power data per column.
Steps 7-11 of algorithm 1 define an operational flow for generating the complete user power data sequence pattern set PS from the structure of the clustering analysis of the data in pData in step 6. For the step 9, if the processed cluster has only one element, the center of the cluster is the unique element in the cluster, otherwise, the center of the cluster is the average value of all the elements in the cluster after accumulation; for step 10, if the processed cluster has only one element, the pattern radius of the user electricity and power data sequence pattern represented by the processed cluster is defined as 0.
In the research, the corresponding all user electricity and power data sequence caused by a certain electricity consumption behavior of the user is data contained in one user electricity and power data class. It is apparent that at this time, one user electricity consumption power data class may include a plurality of user electricity consumption power data sequence patterns.
In the application, a user electricity consumption electric energy data class is triggered by specific electricity consumption behaviors of users, and an event of using electric energy for specific users is caused; similar user power usage actions trigger similar users to use power events. One user power usage electrical energy data class may include a plurality of user power usage electrical energy data sequence patterns, and when one power usage electrical energy sequence of a user is matched with a certain user power usage electrical energy data sequence pattern, an event of using electrical energy by a user corresponding to the user power usage electrical energy data class to which the matched user power usage electrical energy data sequence pattern belongs occurs.
In order to determine whether an expected event of using electric energy by the user occurs based on the user electric energy consumption sequence, all possible user electric energy data sequence patterns of the user electric energy data class corresponding to the event of using electric energy by the user need to be given. All user electricity and power data sequence patterns contained in the user electricity and power data can be obtained by using the algorithm 1. The user electricity consumption electric energy data class to which each user electricity consumption electric energy data sequence mode belongs can be marked manually, and the algorithm 1 can be used for carrying out cluster analysis on the reference data of the known user electricity consumption electric energy event so as to automatically mark the user electricity consumption electric energy data class to which the user electricity consumption electric energy data sequence mode belongs.
The user electricity consumption electric energy data class is constructed by manually marking or automatically marking the user electricity consumption electric energy data sequence mode; and the user electricity and power data class to which the user electricity and power sequence belongs is judged by the user electricity and power data sequence mode to which the sequence belongs.
Algorithm 2: identification of user power data class to which user power sequence belongs
Input: a user electricity consumption electric energy sequence A of the user electricity consumption electric energy data class to be identified; a set P of all user electricity and power data sequence mode centers; a mode radius set R of all user electricity and power data sequence modes; an identification set C of the user electricity consumption electric energy data class to which each user electricity consumption electric energy data sequence mode belongs;
And (3) outputting: set CS of user electricity consumption electric energy data classes to which user electricity consumption electric energy sequence A belongs
1)CS=Φ;
2) A user electricity and power data sequence mode center P in the for each P;
3) Calculating the distance dp from p to A;
4)if dp≤rp
5) Acquiring the identification of the user electricity and power data class which belongs to the user electricity and power data sequence mode centering on p from C to cp;
6)CS=CS∪{cp};
the algorithm 2 firstly judges the user electricity consumption electric energy data sequence mode of the user electricity consumption electric energy sequence A of the user electricity consumption electric energy data sequence to be identified according to the known user electricity consumption electric energy data sequence mode and the mode radius thereof, and then judges the user electricity consumption electric energy data sequence A of the user electricity consumption electric energy according to the user electricity consumption electric energy data sequence mode of the user electricity consumption electric energy data sequence. In algorithm 2, each known user electrical energy data sequence pattern is represented by its center, the center of all known user electrical energy data sequence patterns forming the set P in the algorithm input requirements, and the pattern radius of all known user electrical energy data sequence patterns forming the set R in the algorithm input requirements. Because the algorithm 2 needs to distinguish the user electricity consumption electric energy data class where the user electricity consumption electric energy sequence A is located, the class identification of the user electricity consumption electric energy data class where each known user electricity consumption electric energy data sequence mode is located is stored in the set C in the algorithm input requirement.
It should be noted that, in step 4 of algorithm 4.2, the variable rp takes the value of the pattern radius of the user power consumption data sequence pattern in step 3, where the user power consumption data sequence pattern center p is located. Since the pattern radius of all known user power data sequence patterns is stored in the input R of the algorithm, the value of rp can be retrieved from R according to p.
In the research, the corresponding all user electricity and power data sequence caused by a certain electricity consumption behavior of the user is data contained in one user electricity and power data class. It is apparent that at this time, one user electricity consumption power data class may include a plurality of user electricity consumption power data sequence patterns.
The collected electric energy usage data of the microwave oven for processing the microwave popcorn by using the microwave oven is the accumulated quantity of the electric energy metering by the intelligent socket from the power-on operation, so that in order to acquire all normal working modes during the power-on operation of the microwave oven by using the algorithm 4.1, the collected electric energy data of the microwave oven needs to be converted into effective data each time, and the effective data is the electric energy usage quantity from the last time of collecting the data. Considering that 142 groups of microwave oven power consumption data of microwave popcorn processed by the microwave oven are acquired in total, the effective data of each use is the difference between the power consumption data of the microwave oven of the rear group and the power consumption data of the microwave oven of the front group, and the total number of the power consumption data of the microwave oven is 141. In order to obtain a user (microwave oven) power consumption electric energy data sequence mode set of the microwave oven, wherein the microwave oven is powered on, the microwave oven starts the microwave popcorn processing operation and stops the microwave popcorn processing operation, the structure of selecting 3 groups of effective data for the user (microwave oven) power consumption electric energy data sequence mode set is researched and selected: 1) 10 effective data (1 st to 10 th) of the microwave oven on power; 2) 50 valid data (15 th to 64 th) of the microwave oven start-up process microwave popcorn operation; 3) 20 pieces of valid data (96 th to 115 th) after the microwave popcorn processing was completed in the microwave oven.
Considering that the processing of microwave popcorn using a microwave oven is completed in about 3 minutes, when the user (microwave oven) power consumption data sequence pattern set is obtained using algorithm 1, the length L of the user power consumption data sequence time window is set to 3, that is, the power consumption condition when the microwave oven is powered on for only 6 seconds is used to construct the user (microwave oven) power consumption data sequence pattern set. Thus, algorithm 4.1 requires processing (10-l+1) + (50-l+1) + (20-l+1) =8+48+18=74 data in the present example application. Obviously, for the electric energy data of the microwave oven, when the class parameter k of the k-means clustering algorithm changes between 1 and 74 from the clustering perspective, the electric energy data sequence pattern set of the user (microwave oven) obtained by the algorithm 1 as a clustering result has good contour characteristics. Considering the number of data in the data set, therefore, according to the contour coefficient, regarding the pattern number k (the category parameter k of the k-means clustering algorithm), finally, when the validity of the algorithms 1 and 2 is analyzed, the parameter pattern number k is selected within 10 when the algorithm 1 is used for acquiring the user (microwave oven) power data sequence pattern set.
From the clustering perspective, when the class parameter k of the k-means clustering algorithm is changed between 1 and 74, the user (microwave oven) power consumption data sequence mode set obtained by the algorithm 1 as a clustering result has good SSE characteristics. Considering the number of data in the data set, therefore, according to SSE values, regarding the pattern number k (class parameter k of the k-means clustering algorithm), finally, when the validity of the algorithms 1 and 2 is analyzed, the parameter pattern number k is selected within 10 when the algorithm 1 is used for acquiring the user (microwave oven) power data sequence pattern set.
And integrating the profile coefficient condition of the clustering result and the SSE value condition of the clustering result, wherein the value of the parameter mode number k is 8 when the algorithm 1 is used for acquiring the power and energy data sequence mode set of a user (a microwave oven) in combination with the quantity of data in the dataset. At k=8, from the clustering perspective, algorithm 1 obtains the profile coefficient of the user (microwave oven) power data sequence pattern set with average value 0.8861, variance 0.0499, and SSE value with average value 4.9250e-009, variance 1.7621e-009. Obviously, the algorithm 1 is used for acquiring the power data sequence mode set of the user (microwave oven), and the stability is quite good while the performance is good. And when l=3 and k=8, clustering analysis results are carried out on the power consumption data of the microwave oven, and distribution of SSE values and distribution of profile coefficients are carried out. For the use time situation of the user (microwave oven) power consumption data sequence mode set obtained by using the algorithm 1 when l=3 and k=8, the test shows that the average use time of the user (microwave oven) power consumption data sequence mode set obtained by using the algorithm 1 is 0.002 seconds, and the variance is 5.0988e-004. Obviously, the algorithm 1 can be used for rapidly acquiring the power consumption and energy data sequence mode set of a user (a microwave oven).
When the influence of the pattern number k and the sequence time window L on the performance of the algorithm 1 is tested, the algorithm 1 is repeatedly used for 400 times for the value of each pair of the pattern number k and the sequence time window L. SSE value, contour coefficient and time for obtaining power consumption data sequence pattern set of microwave oven are collected and stored.
In the research, each user electricity consumption electric energy data sequence mode is expressed as a user electricity consumption electric energy data class, and for a user electricity consumption electric energy sequence with an unknown membership mode, whether the user electricity consumption electric energy data class belongs to any known user electricity consumption electric energy data sequence mode or not can be judged by using an algorithm 2 to identify the user electricity consumption electric energy data class to which the user electricity consumption electric energy data class belongs. If each user power consumption data sequence mode is triggered by normal power consumption behaviors of the user, the user power consumption data sequence mode which belongs to the known user power consumption data sequence mode cannot be judged, and the user power consumption data sequence mode can be judged to be triggered by abnormal power consumption behaviors.
According to the monitoring data of the electric energy used in the microwave oven operation, after the sequence time window L=3, the electric energy data sequence mode set with the maximum mode number k=8 and the mode radius of each mode in the mode set are obtained by using the algorithm 1, the electric energy data class of the user (the microwave oven) to which the electric energy data sequence belongs can be identified by using the algorithm 2, and whether the abnormal use behavior of the microwave oven occurs or not can be identified according to the identification result. The recognition result of the abnormal use behavior of the microwave oven is shown in fig. 10. In fig. 10, the abscissa indicates the sequential number of the user (microwave oven) power data sequence, and the ordinate indicates the discrimination result: the ordinate value of 1 indicates that the user (microwave oven) power consumption data sequence corresponding to the abscissa value belongs to a known user power consumption data sequence mode; the value of 0 on the ordinate indicates that the user (microwave oven) power consumption data sequence corresponding to the value on the abscissa is not affiliated to a certain known user power consumption data sequence mode, and abnormal microwave oven power consumption behavior occurs. In fig. 5 to 9, in order to highlight the determination of abnormal electric power consumption behavior of the microwave oven, the determination of abnormal electric power consumption behavior of the microwave oven uses red dot marks. When the algorithm 2 is used for judging the abnormal electricity utilization behavior of the microwave oven shown in fig. 10, the algorithm 1 uses k-means clustering to obtain SSE values of the clustering results as 2.8837e-008 and contour coefficients as 0.8833. According to the marking of the operation behavior of the microwave oven during the experiment, the moment of marking the data by the red spot of the first part can be found to be the moment of connecting the microwave oven into the intelligent socket, and the microwave oven has the operation of opening the microwave oven door twice when the abscissa is at the moment of marking the red spot of the part 68-96. Obviously, algorithm 2 effectively discovers the power utilization behavior of the microwave oven which is not learned by algorithm 1.
In order to illustrate the behavior that algorithm 1 and algorithm 2 can be effectively applied to abnormal electric energy use of resident users in production practice, the scheme combines the process of acquiring abnormal electric energy use behavior of a microwave oven by analyzing on-line monitoring data of a microwave oven of a resident of a certain community, and discusses the use mode and time of related algorithms from the aspects of missing data processing, sequence data preparation, acquisition of an electric energy data sequence mode set, abnormal electric energy use behavior identification based on the electric energy data sequence mode set and the like: the algorithm 1 is used for acquiring a user power consumption data sequence mode set (one power consumption data sequence mode for each element in the set and the mode radius thereof); algorithm 2 is used for abnormal power usage behavior based on a pattern set of power usage power data sequences. The effectiveness and accuracy of the algorithm 2 for identifying abnormal electricity consumption behavior depends on the performance of the cluster analysis method when the algorithm 1 is used for constructing the user electricity consumption energy data sequence pattern set. More experimental surfaces, when SSE values of the cluster analysis method are kept at the optimal values (the values are smaller), the profile coefficients of the user electricity consumption data sequence mode set serving as the cluster analysis result have obvious influence on the accuracy of identifying abnormal electricity consumption behaviors by the algorithm 2: the larger the profile factor (closer to 1), the more accurate the abnormal power usage behavior described by algorithm 2, which is particularly noticeable in practical applications when implementing optimization algorithm 1. In practical application, in order to keep accurate abnormal electricity behavior identification, ensuring that the average value of the profile coefficients of the user electricity and power data sequence mode set as a clustering analysis result is not smaller than 0.8 is a positive measure.
3. Household energy anomaly detection
Along with the continuous development and popularization of the intelligent power grid, the electric power market scale of China is continuously enlarged, and new thinking is provided for how to improve the economic benefit and reduce the economic loss of an electric power company. The utility losses are mainly technical losses and non-technical losses. Moreover, the non-technical loss is a major part of the total economic loss of the electric power company, and according to investigation, the loss due to theft accounts for 0.5% to 3.5% of annual income in the united states, and the decrease in income due to abnormal electric power is more serious in developing countries. Abnormal electricity consumption not only reduces the stability, safety and reliability of the power grid, but also increases unnecessary resource consumption. The traditional manual screening method cannot meet the requirement of abnormal electricity utilization detection, so that how to rapidly and effectively discover abnormal electricity utilization behaviors of residents from massive electric power data is necessary, and scientific decision basis is provided for government parts. The scheme takes resident electricity as a main research object to develop abnormal energy consumption detection research.
Whether the normal electricity consumption data or the abnormal electricity consumption data belongs to a typical time sequence, so that the time sequence characteristics among the data are fully considered in the detection of the abnormal electricity consumption behavior of the clients. At the same time, CNNs are adept at mining local critical features in the data space. Aiming at the problems existing in the current user behavior anomaly detection, the section combines a dual-channel 1DCNN-AttBILSTM network architecture to detect the abnormal electricity utilization behavior of the client. Firstly, electric quantity consumption data of an electric power customer are obtained, but because abnormal electricity consumption data belong to typical unbalanced data, after experimental data are preprocessed, balance processing is carried out on the data, and comparison shows that the Border-SMOTE oversampling mode is more obvious in improvement of accuracy than the SMOTE oversampling mode, so that the data tend to be balanced by adopting the Border-SMOTE oversampling mode, and then deep characteristics of electricity consumption behaviors can be better learned by utilizing network construction, and the accuracy and stability of abnormal electricity consumption behavior identification are improved. Finally, comparing with other machine learning and deep learning classification results, and verifying the practicability and universality of the network architecture on the abnormal electricity utilization data set. Meanwhile, according to the analysis of the result, the processing effect of the network on the long-time sequence is improved to a certain extent compared with other networks.
(1) And (5) establishing an electrical behavior abnormality detection model.
Firstly, load curve data of a client is obtained, and is preprocessed, and data is mainly subjected to Border-SMOTE oversampling so as to realize data distribution balance. The dataset was divided into 6:2:2, training, validation, test data, respectively. And (3) performing feature learning on the training set data by using the constructed two-channel 1DCNN-AttBILSTM power consumption behavior anomaly detection model, and then searching for an optimal power consumption behavior anomaly detection model through the test result of the verification set.
(2) And (5) perfecting and supplementing the electricity behavior abnormality detection model.
Because of the continuous collection of the intelligent power grid data, the obtained load curve data is continuously generated, and the model needs to be perfected and supplemented. And (3) for the newly added load curve data, analyzing the electricity utilization characteristics of the newly added data by updating the training data, and adjusting an electricity utilization behavior abnormality detection model so as to supplement and perfect a model library.
(3) And (5) automatically detecting abnormal electricity utilization behaviors.
And (3) acquiring an obtained electricity consumption behavior abnormality detection model, and carrying out customer behavior analysis on the acquired load data. The collected load data sources mainly comprise collected data of the metering device, historical data of the power marketing system and the like. And finally, outputting the abnormal electricity consumption behavior detection result.
(4) Data set profiling and balanced preprocessing.
The experiment adopts an actual electric quantity consumption data set (State Grid Corporation ofChina, SGCC) issued by the China national electric network. This data set contains power consumption data for 42372 resident users over 1035 days. And according to the statistics of the national power grid, the data set contains 3615 resident users with abnormal electricity consumption, and the resident users account for 8.5% of the whole data set. Load curves for normal and abnormal residential subscribers are shown. As can be seen from the figure, the electricity consumption of normal customers is regular, the electricity consumption of abnormal resident users fluctuates greatly, the whole electricity consumption throughout the year is low, and no obvious time periodicity exists. And the data set has a longer time scale, and can verify the effectiveness of the dual-channel 1DCNN-AttBILSTM network on long-time sequence classification.
Preliminary analysis of the data set shows that in the SGCC data set, the number distribution of normal electricity utilization clients and abnormal electricity utilization clients is unbalanced, and compared with the normal clients, the number of abnormal clients is small. Therefore, it is not suitable to construct and learn a network by directly using such data, particularly in a case where the degree of attention to a few categories is high such as abnormal electricity detection. Because of unbalance of data, model judgment is not accurate enough, and the predicted result is more biased to one with more data quantity, so that generalization of the model is greatly reduced. This section considers solving the problem of data distribution imbalance by generating comprehensive data to increase the number of abnormal clients. The classification model is then trained using the processed balanced dataset.
(5) Unbalanced data processing based on Border-SMOTE algorithm
The SMOTE method is an oversampling algorithm proposed by chawlan et al. It links the minority class with its surrounding data and uses a random interpolation technique to generate a new minority class.
1) For the power anomaly data xi, K data around the data xi were obtained using K-Nearest Neighbor (KNN) [102 ].
2) The sampling ratio is set according to the unbalanced ratio of the data to determine the sampling multiplying power N, and in each minority xi, a sampling point is arbitrarily selected from K neighbors, and the selected point is x N.
3) For each arbitrarily selected point x n, new data is formulated with the original data xi, respectively.
However, this method has the disadvantage that SMOTE treats all the few classes equally and does not incorporate information of neighboring data therein, however, it is precisely these information that are more prone to be misjudged, thus causing sample confusion and degrading classification results. Therefore, the section adopts a Border-SMOTE 103 method to perform unbalance processing, which is an SMOTE optimization-based algorithm, and the method is only used for sampling less types of edge data and further adjusting the data distribution proportion, unlike SMOTE. The Border-SMOTE was calculated as follows:
1) KNN is obtained according to the classification condition of the peripheral data of each few types of samples, and the original data is divided into Danger, safe, noise types. If more than half of the few classes are multi-class, the data is used as boundary data, namely Danger data; if more than half of the few classes are around the few classes, the Safe sample is obtained; and if the periphery of the few classes is all multi-class, the data is the Noise data. The garder-SMOTE expands mainly the Danger samples.
2) After finding all Danger data, the Border-SMOTE found KNN in the class for boundary data, and the detailed procedure was consistent with SMOTE.
The Border-SMOTE algorithm generates samples, for example, as shown in fig. 15:
(6) Residential electricity behavior anomaly detection experiment and analysis
Firstly, deleting missing values of data, detecting and complementing abnormal data, and normalizing load data. Meanwhile, the problem of data unbalance is solved by using the Border-SMOTE algorithm. The resulting dataset contained 47708 pieces of data in total. After data are synthesized through a Border-SMOTE algorithm, the proportion of normal electricity consumption to abnormal electricity consumption is 1:1, the synthesized data are divided into training sets, verification sets and test sets according to a ratio of 6:2:2, and different electricity consumption clients are guaranteed to contain a certain proportion in each category, and no intersection exists between the test sets and the training sets.
The evaluation index is mostly the same as that of the fourth chapter curve classification, but it should be noted that for anomaly detection, P and N are for prediction, P refers to a sample predicted as positive, i.e., predicted as anomaly, and labeled 1. N refers to the sample predicted as negative, i.e., predicted as normal, with a label of 0. Moreover, the normal electricity consumption and abnormal electricity consumption clients in the electricity consumption data set are extremely unbalanced before balancing the data, so that the influence on the model before and after balancing the data is analyzed and the model is more effectively evaluated. The Area Under ROC Curve (Area Under Curve, AUC) was used for analysis. The horizontal axis of the curve represents the false positive rate (False Positive Rate, FPR), which indicates the probability of misclassification of normal data into abnormal data for all normal data, and the vertical axis of the curve represents the true positive rate (True Positive Rate, TPR), which represents the proportion of correct data discriminated from all abnormal data.
The FPR and TPR formulas are expressed as follows:
where TP is the number of positive class judgment pairs, TN is the number of negative class judgment pairs, FP is the number of negative class judgment errors, and FN is the number of positive class judgment errors.
The AUC evaluation index can be evaluated in combination with the relative balance of TPR and FPR, with AUC being a number in the middle of 0.5 to 1, closer to 1 indicating better classification. The method can be used as an evaluation index of the classification model in the case of extreme imbalance.
Wherein rank is i, M is the number of electricity utilization normal curves, and N is the number of electricity utilization abnormal curves.
To test the effectiveness of the Border-SMOTE oversampling on the unbalanced data set processing method, SMOTE oversampling and raw data without any processing were chosen as a comparison. Except for different data processing modes, the data preprocessing and model parameters are completely consistent, and a dual-channel 1DCNN-AttBILSTM network is selected to learn the data and judge the electricity consumption behavior of resident users.
From the data, it can be seen that there are 4773 normal customer load curves and 349 abnormal customer load curves in the test set by analyzing the classification results of the model without any processing in the original data set. Normal customers are approximately 13.68 times that of abnormal customers. The model was very high in Acc, but very low in R and Pre. Indicating that the classification network is less sensitive to power use anomaly data when the data is unbalanced. As long as the classifier marks all samples as positive samples, acc will rise, and this approach will lead to no meaning of classification results and reduce model training effect.
After the data is sampled by using an SMOTE oversampling mode, the total number of the new data reaches 47708, the number of normal customer load curves in the test set is 4793, the number of abnormal customer load curves is 4749, the data sample is enlarged, the proportion of positive and negative samples is averaged, and although Pre is reduced compared with the original data set, the F1 value and R, pre indexes are greatly improved, so that the classification effect of the model on abnormal customers after the sample balancing is improved. However, the AUC values were reduced by about 0.025 compared to the Border-SMOTE, indicating that the classification effect was reduced without consideration of the class information of the neighbor samples.
After unbalance treatment of the data by using a Border-SMOTE, the AUC value is respectively improved by 0.2548 and 0.0254, and the Acc is respectively improved by 2.61 percent points and 7.3 percent points compared with the original data set and the data set sampled by the SMOTE. And the F1 value, the Pre and the R are respectively improved by about 0.62, 0.40 and 0.73 compared with the SMOTE oversampling, and the classification index is also improved to different degrees. Therefore, the classification effect of the model is greatly improved after the data is balanced by the Border-SMOTE, and the classification result is more accurate.
To verify the effectiveness of the two-channel 1DCNN-AttBILSTM network for customer power usage anomaly detection, the two-channel 1DCNN-AttBILSTM network was compared to model LSTM, 1DCNN, SVM, biLSTM, attBiLSTM, t-LeNet. The classification effect of the above network on the test set is compared.
From the data, the AUC of the two-channel 1 DCNN-attbrilstm model was increased by about 9.12%, acc was increased by about 14%, and F1, R and Pre were increased by about 0.14, 0.18 and 0.11, respectively, as compared to the conventional SVM. The overall improvement degree is larger, which shows that the traditional machine learning has defects in the aspect of abnormality detection of clients, the detection effect of the deep learning is superior to that of the traditional machine learning method, and the hidden features in the data can be better identified.
Comparing with LSTM model, it can be found that AUC value of BiLSTM model is increased by 0.03.Acc increases by 5.52%, and F1, R and Pre all increase by more than 5%. Therefore, the data set has more obvious superiority in bidirectional feature extraction on load data which is 1035 days long, and can greatly improve the classification effect of abnormal electricity utilization detection.
When the AttBILSTM model and the 1DCNN model are independently used, the ACC and the AUC can achieve a certain effect, and the indexes are higher than those of the traditional machine learning model and the LSTM model, but compared with the model, the AUC is respectively reduced by 0.0108 and 0.0076, and the comprehensive evaluation indexes such as F1 and R, pre are also deficient to a certain extent. The model combines the local space extraction capability of 1DCNN and the bidirectional feature extraction capability of BiLSTM on time series, and combines the advantages of the two models, thereby achieving better abnormality detection effect.
Based on the effectiveness of the dual-channel 1DCNN-AttBILSTM network in the detection of abnormal power consumption behaviors of clients. The network as a whole comprises: load data processing, network training and result assessment. In the model training part, for the situation that positive and negative sample data are seriously detuned in the customer behavior detection, a Border-SMOTE is adopted for oversampling of samples, and network training, verification and testing are carried out through a balanced reconstruction data set, so that the detection of the abnormal customer power consumption behavior is completed. The data set of actual electric quantity consumption of the Chinese national power grid is verified, firstly, the data which is subjected to over-sampling by using a Border-SMOTE is compared with the original data which is not processed and the data which is subjected to over-sampling by using the SMOTE, and after the data is subjected to unbalanced processing by using the Border-SMOTE over-sampling mode, the AUC is improved by 0.2548 and the F1 value is improved by 62.45 percent compared with the original data set. And then, comparing the model adopted in the section with other newer methods, and comparing the classification results of various models. The results show that the model effect of this section is superior to that of the traditional SVM machine learning model, the AUC is improved by about 0.0912, the Acc is improved by about 14%, and F1, R and Pre are respectively improved by about 0.14, 0.18 and 0.11. The whole lifting degree is larger. AUC was increased by 1.08% and 0.76% and F1 was increased by 1.43% and 1.61% respectively compared to the monomer 1DCNN, attBiLSTM deep learning model. It is also effective to demonstrate that the present approach is useful for anomaly detection of customer power usage. Meanwhile, the model provided by the scheme is a very practical research on theft by using a high-technology means, and is beneficial to reducing and compensating economic loss caused by abnormal electricity consumption.
The customer classification and anomaly detection are important in the electric power field, and have good guiding effects on electricity price assignment, customer behavior anomaly detection, electric power prediction and the like. The current power customer load curve has large data volume and various changes, and the traditional machine learning method has poor adaptability to large-scale data. Thus, in view of the above problems, deep learning based power load classification is studied herein. And identifying the unlabeled data and the labeled load data through clustering and classifying means respectively. And the classification method is applied to the abnormality detection work, so that a good effect is obtained.
In the scheme, the method comprises the following steps:
1) A daily load curve cluster analysis method based on deep convolution embedded clustering is provided. Aiming at the situation that the traditional clustering method is difficult to process high-dimensional multivariable data, deep features are difficult to extract, and feature extraction and clustering method separation exist. The DECE-1D method is provided for clustering daily load curves, and the feature extraction part introduces 1D-CAE to extract deep time sequence features so as to realize high-dimensional data dimension reduction. And then, combining the feature extraction and the feature processing by adding a clustering layer, and obtaining a final result by combined optimization. Experiments on the grape teeth load data show that compared with the traditional dimension reduction and clustering mode, the DBI index of the method is reduced by 1.42, and the reduction is obvious. Compared with newer DEC-1D-CAE and IDEC, CHI improves 12488.47 and 19384.92 respectively, SC indexes are improved by about 0.05 and 0.10 respectively, and the effectiveness and practicability of the method are verified.
2) A daily load curve classification method based on a dual-channel 1DCNN-AttBILSTM network is provided. A dual-channel load classification model is provided by combining the local feature extraction capability of 1DCNN to the load characteristic and the time sequence feature extraction capability of BiLSTM to the load curve. Meanwhile, an attribute layer is added after BiLSTM, so that the network pays attention to information which is more useful for classification results and is given a larger weight, and information which is less in contribution is given a smaller weight. Simulation and comparison analysis are carried out on experiments through Ireland data sets, and the results show that compared with the monomer networks 1DCNN and AttBILSTM, the test set Acc is respectively improved by 2.04 percent and 2.15 percent, the F1 value is respectively improved by 2.04 percent and 2.16 percent, and the classification effect of daily load curves can be greatly improved. Meanwhile, after the attention layer is added, the classification result is improved by 0.89% compared with the network test set Acc before, the F1 is improved by 0.91%, and the Pre and R are also improved to different degrees. The validity of the network in daily load curve classification is verified.
3) And (3) performing anomaly detection on the electricity utilization behavior of the client by using a two-channel 1 DCNN-AttBuLSTM network. Firstly, aiming at the problem of data unbalance in anomaly detection, a Border-SMOTE oversampling mode is used for carrying out unbalanced processing on sample data, increasing the number of samples and balancing the proportion of positive and negative samples. And then, training the processed real electricity utilization abnormal data of the national power grid by using a dual-channel 1DCNN-AttBILSTM network and predicting whether the data are abnormal electricity utilization clients. Experimental results show that the accuracy of the network classification result before the balance treatment is compared, the AUC is improved by 25.48% compared with the original data set, the F1 value is improved by 62.45%, and the classification accuracy can be greatly improved by the balance treatment of the data. Meanwhile, other machine learning and deep learning methods are used for classifying the data after the balance processing, and the model detection effect of the section is proved to be obviously improved compared with that of a traditional SVM machine learning model, the AUC is improved by about 0.0902, the Acc is improved by about 14%, and F1, R and Pre are respectively improved by about 0.14, 0.18 and 0.11. AUC increased by 1.08% and 0.76% and F1 increased by 1.43% and 1.61% respectively compared to the monomer 1DCNN, attBiLSTM network model. And verifying that the network can effectively complete the task of abnormality detection.
Those of ordinary skill in the art will appreciate that the elements and method steps of each example described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of each example have been described generally in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and systems may be implemented in other ways. For example, the above-described division of units is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The units may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The resident energy abnormality identification method is characterized by comprising the following steps of:
s1, analyzing abnormal energy consumption of residents according to historical abnormal energy consumption data;
s2, identifying abnormal energy data by utilizing a domestic energy abnormality algorithm;
s3, detecting abnormal energy consumption of residents.
2. The resident energy abnormality identification method according to claim 1, characterized in that: in step S1, the specific method is as follows:
a1, extracting characteristics in abnormal energy data to form an abnormal energy characteristic library;
a2, establishing and optimizing a standard feature library;
a3, identifying an abnormal energy utilization curve;
and A4, statistically analyzing abnormal energy utilization events.
3. The resident energy abnormality identification method according to claim 2, characterized in that: in step A1, the specific method is as follows:
and extracting the characteristics in the abnormal energy data according to an artificial intelligence algorithm, establishing a corresponding model, repairing test sample data for the abnormal characteristic model, and accumulating to form a resident abnormal energy characteristic library.
4. The resident energy abnormality identifying method according to claim 2, characterized in that in step A2, the specific method is as follows: and analyzing standard characteristics of data curve change according to the historical normal data, and establishing and optimizing a standard characteristic library.
5. The resident energy abnormality identifying method according to claim 2, characterized in that in step A3, the specific method is as follows: and according to the standard feature library rule, identifying a non-standard curve in the historical data at regular intervals, judging the non-standard curve as abnormal, and intercepting the abnormal curve as resident abnormal energy data.
6. The resident energy abnormality identifying method according to claim 1, characterized in that in step A4, the specific method is as follows: and identifying abnormal reasons according to abnormal energy data, integrating abnormal energy event information, and carrying out statistical analysis on the abnormal energy events to form abnormal energy condition display, abnormal energy client ranking, abnormal energy industry classification ranking, abnormal similarity analysis and self-learning trend analysis under each classification.
7. The resident energy abnormality identification method according to claim 1, characterized in that in step S2, the specific method is as follows:
b1, clustering historical daily load data of all users by using a K-Means algorithm, determining the electricity consumption behaviors of the users, and giving out clustering labels of each electricity consumption behavior;
and B2, taking the historical daily load data of the user as input, taking the electricity behavior label as output, and establishing a daily load classification model.
8. The resident energy abnormality identification method according to claim 7, characterized by comprising, in step S2, specifically:
in the daily operation stage, daily loads of all users are classified, and the energy utilization behaviors of the same type of users are compared according to the transverse scoring standard of resident users to give transverse scores;
The number of clusters in the K-Means is set by people, and the value of the number of clusters K is judged by using the Elbowmethod, silhouette Coefficient and Calinski-Har-abaz Index;
clustering performance is measured through distortion degree by using an Elbow Method;
silhouette Coefficient is set as an evaluation Index of the degree of density and dispersion of the class, when the value is [ -1,1], the closer 1 represents the more reasonable the K value, the Calinski-Harabaz Index is defined as the ratio of the inter-group discrete to the intra-group discrete, and the larger the score is, the better the clustering effect is.
9. The resident energy abnormality identification method according to claim 8, characterized in that: in step S2, it includes:
definition 1: data sequence A 1 =<a 11 ,a 12 …a 1L >If a 1i I=1, 2 … L is the electric energy consumption value used by the user in the period from the i-1 th moment to the i-th moment, observed at the i-th moment in a time window, then a 1 =<a 11 ,a 12 …a 1L >Is a user electricity and power data sequence, L is the length of the time window of the sequence;
definition 2: data sequence A * 1 =<a * 11 ,a * 12 …a * 1L >Is given user electricity consumption electric energy data sequence, r 1 >0, if there is a user electricity consumption electric energy data sequence A 1 =<a 11 ,a 12 …a 1L >Satisfy A * 1 -A 1 ||<r 1 Then call A * 1 Is a user power consumption electric energy data sequence mode, r 1 Is the mode radius of the mode.
10. The resident energy abnormality identification method according to claim 1, characterized in that in step S3, the specific method is as follows:
c1, establishing an electricity behavior anomaly detection model;
and C2, perfecting and supplementing an electricity consumption behavior abnormality detection model:
c3, automatically detecting abnormal electricity consumption behaviors;
c4, carrying out profile introduction and balance pretreatment on the data set;
c5, processing unbalanced data based on a Border-SMOTE algorithm;
and C6, detecting, testing and analyzing the abnormal electricity utilization behaviors of residents.
CN202310727271.6A 2023-06-19 2023-06-19 Resident energy abnormality identification method Pending CN116796271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310727271.6A CN116796271A (en) 2023-06-19 2023-06-19 Resident energy abnormality identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310727271.6A CN116796271A (en) 2023-06-19 2023-06-19 Resident energy abnormality identification method

Publications (1)

Publication Number Publication Date
CN116796271A true CN116796271A (en) 2023-09-22

Family

ID=88033937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310727271.6A Pending CN116796271A (en) 2023-06-19 2023-06-19 Resident energy abnormality identification method

Country Status (1)

Country Link
CN (1) CN116796271A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033705A (en) * 2023-10-10 2023-11-10 北京鼎诚鸿安科技发展有限公司 Data value-added service method for client side energy interconnection
CN117114252A (en) * 2023-10-24 2023-11-24 陕西禄远电子科技有限公司 Comprehensive energy intelligent management method based on Internet of things

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033705A (en) * 2023-10-10 2023-11-10 北京鼎诚鸿安科技发展有限公司 Data value-added service method for client side energy interconnection
CN117033705B (en) * 2023-10-10 2024-01-19 北京鼎诚鸿安科技发展有限公司 Data value-added service method for client side energy interconnection
CN117114252A (en) * 2023-10-24 2023-11-24 陕西禄远电子科技有限公司 Comprehensive energy intelligent management method based on Internet of things

Similar Documents

Publication Publication Date Title
CN110213222B (en) Network intrusion detection method based on machine learning
CN109146705B (en) Method for detecting electricity stealing by using electricity characteristic index dimension reduction and extreme learning machine algorithm
CN106780121B (en) Power consumption abnormity identification method based on power consumption load mode analysis
CN116796271A (en) Resident energy abnormality identification method
CN110930198A (en) Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN112732748B (en) Non-invasive household appliance load identification method based on self-adaptive feature selection
CN111401785A (en) Power system equipment fault early warning method based on fuzzy association rule
CN111553444A (en) Load identification method based on non-invasive load terminal data
CN114676742A (en) Power grid abnormal electricity utilization detection method based on attention mechanism and residual error network
CN110569876A (en) Non-invasive load identification method and device and computing equipment
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
CN110738232A (en) grid voltage out-of-limit cause diagnosis method based on data mining technology
CN110059845A (en) Metering device clocking error trend forecasting method based on timing evolved genes model
CN116595426B (en) Industrial Internet of things data intelligent acquisition management system
CN113125903A (en) Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN112305441A (en) Power battery health state assessment method under integrated clustering
Qin et al. Bearing fault diagnosis method based on ensemble composite multi-scale dispersion entropy and density peaks clustering
CN116681186B (en) Power quality analysis method and device based on intelligent terminal
Jianyuan et al. Anomaly electricity detection method based on entropy weight method and isolated forest algorithm
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
Yang et al. Non-Intrusive Load Classification and Recognition Using Soft-Voting Ensemble Learning Algorithm With Decision Tree, K-Nearest Neighbor Algorithm and Multilayer Perceptron
CN115146735A (en) User power utilization anomaly identification
CN110874584B (en) Blade fault diagnosis method based on improved prototype clustering
CN114066219A (en) Electricity stealing analysis method for intelligently identifying electricity utilization abnormal points under incidence matrix
CN113792141A (en) Feature selection method based on covariance measurement factor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination