CN113327008A

CN113327008A - Electricity stealing detection method, system and medium based on time sequence automatic encoder

Info

Publication number: CN113327008A
Application number: CN202110435767.7A
Authority: CN
Inventors: 邓浩; 梁秋实; 赵生捷
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-08-31

Abstract

The invention relates to a method, a system and a medium for detecting electricity stealing based on a time sequence automatic encoder, wherein the detection method comprises a model training stage and a model application stage, and the training stage comprises the following steps: establishing a Gaussian mixture model, and obtaining an optimal Gaussian mixture model by combining an original data set and using an EM algorithm and BIC; dividing the original data set into a plurality of training sets based on the clustering result of the optimal Gaussian mixture model on the original data set; constructing a plurality of automatic encoders and training; the application stage is as follows: and clustering the input data set according to the optimal Gaussian mixture model, and performing anomaly detection by using an automatic encoder corresponding to a clustering result. Compared with the prior art, the method has the advantages that the Gaussian mixture model is used for clustering, electric quantity consumption data with different consumption habits are distinguished, the automatic encoders corresponding to all clustering results are used for carrying out abnormity detection, the electricity stealing detection of the electric quantity consumption data without labels is realized, the application range is wide, and the detection performance is high.

Description

Electricity stealing detection method, system and medium based on time sequence automatic encoder

Technical Field

The invention relates to the field of machine learning, in particular to a method, a system and a medium for detecting electricity stealing based on a time sequence automatic encoder.

Background

Physical electricity stealing means, such as tampering with an electric meter structure or wiring a wire and the like, are easy to detect and discover, and an electric company can reduce electricity stealing behaviors by means of on-site investigation and the like. With the continuous and deep development of information technology, the power system is developing towards intellectualization more and more, and the electricity utilization data of the user is remotely acquired through the intelligent electric meter and is managed. However, with the popularization of the smart electric meters, a new electricity stealing means also appears, and data tampering is directly performed on the premise that the physical parameters of the actual circuit are not changed by tampering the storage link or the communication link of the smart electric meters, so that the effect of reducing the electricity payment fee is achieved. The data tampering of the high-tech electricity stealing method on the intelligent electric meter data storage unit and the communication unit cannot be screened out by the physical checking method. Therefore, it is necessary to deeply mine the collected power consumption data of the user and detect abnormal data by means of machine learning and the like, so as to reduce the occurrence of electricity stealing behavior. At present, relevant scholars at home and abroad research data-based electricity stealing detection methods, and the detection methods based on data-driven models comprise classification-based methods, clustering-based methods, regression-based methods and the like.

The automatic encoder is a special type of neural network, which compresses input data and then reconstructs the compressed input data to make the input and output similar as possible, in the field of abnormal detection, the corresponding sample with large difference between input and output in the automatic encoder is regarded as an abnormal sample, and the abnormal detection based on the automatic encoder has made great results in many fields, especially after a long-short term memory network (LSTM) is used in the automatic encoder, the automatic encoder combines time domain information to make the established automatic encoder encode the time domain information, adapt to a longer time sequence and can perform wider abnormal detection, for example, the abnormal behavior detection method based on the space-time automatic encoder disclosed in the Chinese patent with the publication number of CN 109615019A.

However, when detecting an abnormality, the auto encoder relies on the assumption that the training data itself is all normal, and must be trained with positive samples to obtain the auto encoder, and then the auto encoder can detect an abnormality of negative samples. However, in practical application, the power consumption data collected by the power system is not labeled, and the data cannot be guaranteed to be normal data, and preliminary preparation work must be performed, and the label is set for the collected power consumption data, so that the workload is very large, and the application of the automatic encoder in the electricity stealing detection is limited.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a power stealing detection method, a system and a medium based on a time sequence automatic encoder.

The purpose of the invention can be realized by the following technical scheme:

a power stealing detection method based on a time sequence automatic encoder comprises a model training stage and a model application stage, wherein the model training stage comprises the following steps:

s1: acquiring multi-day electric quantity consumption data of a plurality of users as an original data set;

s2: establishing a plurality of Gaussian mixture models, determining parameters of each Gaussian mixture model by using a maximum expectation method EM (expectation-maximization) respectively in combination with an original data set, and obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, wherein the optimal Gaussian mixture model comprises K Gaussian distributions;

s3: dividing the original data set into K training sets based on the clustering result of the optimal Gaussian mixture model on the original data set;

s4: constructing K automatic encoders based on the long-term and short-term memory artificial neural network LSTM, and respectively training by using a training set to obtain the automatic encoders corresponding to the Gaussian distributions in the optimal Gaussian mixture model;

the model application phase is as follows:

acquiring the electric quantity consumption data of a user in the original data set as an input data set, clustering the input data set according to the optimal Gaussian mixture model, and performing abnormity detection on the electric quantity consumption data in the clustering result by using an automatic encoder corresponding to the clustering result to obtain an electricity stealing detection result.

Further, step S2 is specifically:

s21: sequentially establishing Gaussian mixture models with the number of Gaussian distributions of 1, 2 and … L, wherein L is the maximum number of the Gaussian distributions in the Gaussian mixture models;

s22: for each Gaussian mixture model, respectively determining parameters of the Gaussian mixture model by using a maximum expectation method EM;

s23: and respectively calculating the BIC value of each Gaussian mixture model by using Bayesian information criterion BIC, and selecting the Gaussian mixture model with the minimum BIC value as an optimal Gaussian mixture model, wherein the optimal Gaussian mixture model comprises K Gaussian distributions.

Further, the training of the ith (i is greater than or equal to 1 and less than or equal to K) automatic encoder in step S4 specifically includes:

a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;

a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;

a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.

Further, the error vector is:

wherein e is^(j)Error vector, x, for the jth training data^(j)For the jth training data in the training set,

is x^(j)And (4) inputting the output obtained after the automatic encoder is input.

Further, the anomaly score is calculated by the formula:

a^(j)＝(e^(j)-μ)^Tσ^-2(e^(j)-μ)

wherein, a^(j)For the jth training data x^(j)Abnormal score of, mu and sigma²Gaussian distribution corresponding to the ith training set

Mean and variance of.

Furthermore, in the model application stage, the specific step of performing anomaly detection on the electricity consumption data belonging to the same clustering result is as follows:

and acquiring an automatic encoder corresponding to the clustering result, inputting the electric quantity consumption data into the automatic encoder, calculating the abnormal score of the electric quantity consumption data, and taking the electric quantity consumption data with the abnormal score larger than a score threshold value as abnormal data, wherein the score threshold value is the abnormal score value of the training data with the minimum abnormal score in the M training data removed by the automatic encoder in the training process.

Further, the input data set comprises multi-day power consumption data of a plurality of users, and the daily sampling time and the sampling frequency of the original data set and the input data set are the same.

A time sequential autoencoder based power theft detection system, comprising:

the acquisition module is used for acquiring an original data set and an input data set;

the Gaussian module is used for obtaining an optimal Gaussian mixture model with the Gaussian distribution number of K by using a maximum expectation method EM and a Bayesian information criterion BIC based on the original data set;

the training module is used for constructing an automatic encoder based on the long-short term memory artificial neural network LSTM and training by using a training set to obtain model parameters of the automatic encoder;

and the application module is used for carrying out abnormity detection on the input data set based on the optimal Gaussian mixture model and the automatic encoder to obtain a power stealing detection result.

Further, the training of the ith (i is greater than or equal to 1 and less than or equal to K) automatic encoder in the training module specifically includes:

A computer storage medium having an executable computer program stored therein, wherein the computer program when executed implements a power theft detection method.

Compared with the prior art, the invention has the following beneficial effects:

(1) the method has the advantages that the Gaussian mixture model is used for clustering data, the electric quantity consumption data with different consumption habits are distinguished, the automatic encoders corresponding to the clustering results are used for carrying out abnormity detection, the electric quantity consumption data without labels can be subjected to electricity stealing detection, the application range is wide, the detection performance is high, and reference is provided for the abnormity detection of the electric quantity consumption data without labels.

(2) By using the automatic encoder model based on the LSTM, the characteristics of time sequence, imbalance and large dimensionality caused by high sampling rate of the electricity consumption data are considered, and a result with higher quality can be effectively obtained.

(3) When the automatic encoder model is trained, the training set is directly cleaned from the label-free data through self-supervision learning according to the preset abnormal proportion, so that data labeling is realized, the learning effect of the automatic encoder is changed, and the applicability is high.

(4) The Gaussian mixture model is used for clustering the input electric quantity consumption data, the electric quantity consumption data with different consumption habits can be distinguished, the automatic encoders are respectively constructed and trained, and the robustness and the learning performance of the automatic encoders are enhanced.

Drawings

FIG. 1 is a flow chart of a method for detecting power theft based on a time sequential automatic encoder;

fig. 2 is a comparison of measurements of the effectiveness of different electricity stealing detection methods.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

Example 1:

a power stealing detection method based on a time sequence automatic encoder is disclosed, as shown in figure 1, and comprises a model training stage and a model application stage, wherein the model training stage comprises the following steps:

in this embodiment, step S2 specifically includes:

s22: for each Gaussian mixture model, inputting the electric quantity consumption data in the original data set into the Gaussian mixture model to obtain a probability set corresponding to each electric quantity consumption data, wherein the probability set comprises the probability that the electric quantity consumption data belong to each Gaussian distribution in the Gaussian mixture model; based on the electric quantity consumption data and the corresponding probability set thereof, obtaining a log-likelihood estimation result by using a maximum Expectation (EM), adjusting parameters of the Gaussian mixture model according to a log-likelihood estimation function, re-determining the probability set corresponding to each electric quantity consumption data, re-obtaining the log-likelihood estimation result, repeating the steps until convergence, and obtaining the Gaussian mixture model with the determined parameters; this step is repeated until L gaussian mixture models are determined.

the training of the ith (i is more than or equal to 1 and less than or equal to K) automatic encoder specifically comprises the following steps:

a3: removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the remaining training data in the training sets, completing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents preset abnormal proportion, such as 5%, 10% and the like.

The error vector is:

The calculation formula of the abnormal score is as follows:

a^(j)＝(e^(j)-μ)^Tσ^-2(e^(j)-μ)

Mean and variance of.

The model application phase is as follows:

The input data set comprises multi-day electricity consumption data of a plurality of users, and the daily sampling time and the sampling frequency of the original data set and the input data set are the same.

A time sequential autoencoder based power theft detection system, comprising:

The training module for training the ith (i is more than or equal to 1 and less than or equal to K) automatic encoder specifically comprises the following steps:

A computer storage medium having stored therein an executable computer program that, when executed, implements a power theft detection method.

In this embodiment, the original data set is the electricity consumption data of 4225 residential users of the ireland power company for 535 consecutive days, and the samples are taken every half hour, so that 48 sampling points can be obtained by one user a day.

The method comprises the steps of constructing a plurality of Gaussian mixture models, solving the Gaussian mixture models based on an original data set by using an EM algorithm, obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, clustering data in the original data set by using the optimal Gaussian mixture model according to user habits to obtain K clustering results, and dividing the original data set into K training sets according to the clustering results.

And respectively constructing an automatic encoder for each Gaussian distribution of the optimal Gaussian mixture model, and training the automatic encoder based on a training set. The detection range is wider by using the long-short term memory network LSTM in the automatic encoder in consideration of the time sequence of the power consumption data.

Because the labels of the training data in the training set are unknown, assuming that abnormal data exist in the training set, after the training set is used for training the automatic encoder, the training data are sequentially input into the automatic encoder to obtain output, and the abnormal score of each training data is calculated, in the embodiment, the abnormal proportion (the proportion of electricity stealing behavior set according to experience) is set to be 10%, after the training data are sorted according to the abnormal scores from high to low, the first 10% of the training data are considered to be abnormal data, corresponding to the electricity stealing behavior, the abnormal data are removed from the training set, and the automatic encoder is trained again, so that the final automatic encoder is obtained. Therefore, through self-supervision learning, the training set is directly cleaned from the label-free data, so that data labeling is realized, the learning effect of the automatic encoder is changed, and the applicability is high.

After the automatic encoders corresponding to the optimal Gaussian mixture model and the Gaussian distributions of the optimal Gaussian mixture model are obtained based on the original data set, the method can be applied to actual electricity stealing detection.

The power consumption data of 4225 residential users of Ireland electric company in the original data set in another period is also obtained as the input data set, for example, the original data set is the power consumption data of the users continuously 535 days before 1 month and 1 day of 2020, and the input data set is the power consumption data of the users continuously 30 days after 1 month and 1 day of 2020.

And B, clustering the input data set by using an optimal Gaussian mixture model to obtain K clustering results, respectively inputting the electricity consumption data in the clustering results into corresponding automatic encoders, calculating the abnormal score of each electricity consumption data, and taking the electricity consumption data with the abnormal score larger than a score threshold as abnormal data, wherein the score threshold is the abnormal score value of the training data with the minimum abnormal score in the M training data removed in the step A3. And outputting the user and the date corresponding to the abnormal data, and considering that the user conducts electricity stealing behavior on the date.

In other embodiments, the number of users and the collection date of the power consumption data may also be changed, but in order to ensure the detection accuracy, the user should be the user in the original data set, and after clustering the input data set by using the optimal gaussian mixture model, at least 1 clustering result can be obtained, so that the automatic encoder corresponding to the clustering result can be used to perform anomaly detection on the power consumption data in the same clustering result.

When evaluating the electricity stealing detection algorithm, a True Positive Rate (TPR) and a False Positive Rate (FPR) are selected as evaluation criteria, and the optimal detection algorithm is that the TPR is as high as possible and the FPR is as low as possible. As shown in FIG. 2, compared with an isolated forest iForest, a Robust variance estimation Robust Covariance, a local outlier LOF, a vector machine One-class SVM, a depth automatic coding Gaussian mixture model DAGMM and a DAGMM using an LSTM layer, the difference value of TPR and FPR of the method provided by the invention is obvious, the TPR is higher than that of other methods, the Area (AUC) under a characteristic curve (ROC) of a subject is also obviously larger than that of other methods, and both the One-class SVM and the DAGMM show the problem of poor data adaptability. Therefore, the method provided by the invention is ideal in the aspect of detection accuracy.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. The power stealing detection method based on the time sequence automatic encoder is characterized by comprising a model training stage and a model application stage, wherein the model training stage comprises the following steps:

the model application phase is as follows:

2. The power stealing detection method based on the time sequence automatic encoder as claimed in claim 1, wherein the step S2 is specifically:

3. The method as claimed in claim 1, wherein the training of the ith (1 ≤ i ≤ K) automatic encoder in step S4 comprises:

4. The method of claim 3, wherein the error vector is:

is x^(j)Input automationThe output obtained after the encoder.

5. The method of claim 4, wherein the anomaly score is calculated by the formula:

a^(j)＝(e^(j)-μ)^Tσ_- ²(e^(j)-μ)

Mean and variance of.

6. The power stealing detection method based on the time-series automatic encoder as claimed in claim 3, wherein in the model application stage, the abnormal detection of the power consumption data belonging to the same clustering result is specifically as follows:

7. The method of claim 1, wherein the input data set comprises multi-day power consumption data of a plurality of users, and the raw data set and the input data set have the same daily sampling time and sampling frequency.

8. A time-series automatic encoder-based electricity stealing detection system, which is based on the electricity stealing detection method according to any one of claims 1 to 7, and comprises the following steps:

9. The system according to claim 8, wherein the training module for training the ith (1 ≤ i ≤ K) automatic encoder specifically comprises:

10. A computer storage medium having stored thereon an executable computer program, wherein the computer program, when executed, implements a power theft detection method as claimed in any one of claims 1 to 7.