CN113327008A - Electricity stealing detection method, system and medium based on time sequence automatic encoder - Google Patents

Electricity stealing detection method, system and medium based on time sequence automatic encoder Download PDF

Info

Publication number
CN113327008A
CN113327008A CN202110435767.7A CN202110435767A CN113327008A CN 113327008 A CN113327008 A CN 113327008A CN 202110435767 A CN202110435767 A CN 202110435767A CN 113327008 A CN113327008 A CN 113327008A
Authority
CN
China
Prior art keywords
training
data
gaussian mixture
automatic encoder
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110435767.7A
Other languages
Chinese (zh)
Inventor
邓浩
梁秋实
赵生捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202110435767.7A priority Critical patent/CN113327008A/en
Publication of CN113327008A publication Critical patent/CN113327008A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention relates to a method, a system and a medium for detecting electricity stealing based on a time sequence automatic encoder, wherein the detection method comprises a model training stage and a model application stage, and the training stage comprises the following steps: establishing a Gaussian mixture model, and obtaining an optimal Gaussian mixture model by combining an original data set and using an EM algorithm and BIC; dividing the original data set into a plurality of training sets based on the clustering result of the optimal Gaussian mixture model on the original data set; constructing a plurality of automatic encoders and training; the application stage is as follows: and clustering the input data set according to the optimal Gaussian mixture model, and performing anomaly detection by using an automatic encoder corresponding to a clustering result. Compared with the prior art, the method has the advantages that the Gaussian mixture model is used for clustering, electric quantity consumption data with different consumption habits are distinguished, the automatic encoders corresponding to all clustering results are used for carrying out abnormity detection, the electricity stealing detection of the electric quantity consumption data without labels is realized, the application range is wide, and the detection performance is high.

Description

Electricity stealing detection method, system and medium based on time sequence automatic encoder
Technical Field
The invention relates to the field of machine learning, in particular to a method, a system and a medium for detecting electricity stealing based on a time sequence automatic encoder.
Background
Physical electricity stealing means, such as tampering with an electric meter structure or wiring a wire and the like, are easy to detect and discover, and an electric company can reduce electricity stealing behaviors by means of on-site investigation and the like. With the continuous and deep development of information technology, the power system is developing towards intellectualization more and more, and the electricity utilization data of the user is remotely acquired through the intelligent electric meter and is managed. However, with the popularization of the smart electric meters, a new electricity stealing means also appears, and data tampering is directly performed on the premise that the physical parameters of the actual circuit are not changed by tampering the storage link or the communication link of the smart electric meters, so that the effect of reducing the electricity payment fee is achieved. The data tampering of the high-tech electricity stealing method on the intelligent electric meter data storage unit and the communication unit cannot be screened out by the physical checking method. Therefore, it is necessary to deeply mine the collected power consumption data of the user and detect abnormal data by means of machine learning and the like, so as to reduce the occurrence of electricity stealing behavior. At present, relevant scholars at home and abroad research data-based electricity stealing detection methods, and the detection methods based on data-driven models comprise classification-based methods, clustering-based methods, regression-based methods and the like.
The automatic encoder is a special type of neural network, which compresses input data and then reconstructs the compressed input data to make the input and output similar as possible, in the field of abnormal detection, the corresponding sample with large difference between input and output in the automatic encoder is regarded as an abnormal sample, and the abnormal detection based on the automatic encoder has made great results in many fields, especially after a long-short term memory network (LSTM) is used in the automatic encoder, the automatic encoder combines time domain information to make the established automatic encoder encode the time domain information, adapt to a longer time sequence and can perform wider abnormal detection, for example, the abnormal behavior detection method based on the space-time automatic encoder disclosed in the Chinese patent with the publication number of CN 109615019A.
However, when detecting an abnormality, the auto encoder relies on the assumption that the training data itself is all normal, and must be trained with positive samples to obtain the auto encoder, and then the auto encoder can detect an abnormality of negative samples. However, in practical application, the power consumption data collected by the power system is not labeled, and the data cannot be guaranteed to be normal data, and preliminary preparation work must be performed, and the label is set for the collected power consumption data, so that the workload is very large, and the application of the automatic encoder in the electricity stealing detection is limited.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a power stealing detection method, a system and a medium based on a time sequence automatic encoder.
The purpose of the invention can be realized by the following technical scheme:
a power stealing detection method based on a time sequence automatic encoder comprises a model training stage and a model application stage, wherein the model training stage comprises the following steps:
s1: acquiring multi-day electric quantity consumption data of a plurality of users as an original data set;
s2: establishing a plurality of Gaussian mixture models, determining parameters of each Gaussian mixture model by using a maximum expectation method EM (expectation-maximization) respectively in combination with an original data set, and obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, wherein the optimal Gaussian mixture model comprises K Gaussian distributions;
s3: dividing the original data set into K training sets based on the clustering result of the optimal Gaussian mixture model on the original data set;
s4: constructing K automatic encoders based on the long-term and short-term memory artificial neural network LSTM, and respectively training by using a training set to obtain the automatic encoders corresponding to the Gaussian distributions in the optimal Gaussian mixture model;
the model application phase is as follows:
acquiring the electric quantity consumption data of a user in the original data set as an input data set, clustering the input data set according to the optimal Gaussian mixture model, and performing abnormity detection on the electric quantity consumption data in the clustering result by using an automatic encoder corresponding to the clustering result to obtain an electricity stealing detection result.
Further, step S2 is specifically:
s21: sequentially establishing Gaussian mixture models with the number of Gaussian distributions of 1, 2 and … L, wherein L is the maximum number of the Gaussian distributions in the Gaussian mixture models;
s22: for each Gaussian mixture model, respectively determining parameters of the Gaussian mixture model by using a maximum expectation method EM;
s23: and respectively calculating the BIC value of each Gaussian mixture model by using Bayesian information criterion BIC, and selecting the Gaussian mixture model with the minimum BIC value as an optimal Gaussian mixture model, wherein the optimal Gaussian mixture model comprises K Gaussian distributions.
Further, the training of the ith (i is greater than or equal to 1 and less than or equal to K) automatic encoder in step S4 specifically includes:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.
Further, the error vector is:
Figure BDA0003032975210000031
wherein e is(j)Error vector, x, for the jth training data(j)For the jth training data in the training set,
Figure BDA0003032975210000032
is x(j)And (4) inputting the output obtained after the automatic encoder is input.
Further, the anomaly score is calculated by the formula:
a(j)=(e(j)-μ)Tσ-2(e(j)-μ)
wherein, a(j)For the jth training data x(j)Abnormal score of, mu and sigma2Gaussian distribution corresponding to the ith training set
Figure BDA0003032975210000033
Mean and variance of.
Furthermore, in the model application stage, the specific step of performing anomaly detection on the electricity consumption data belonging to the same clustering result is as follows:
and acquiring an automatic encoder corresponding to the clustering result, inputting the electric quantity consumption data into the automatic encoder, calculating the abnormal score of the electric quantity consumption data, and taking the electric quantity consumption data with the abnormal score larger than a score threshold value as abnormal data, wherein the score threshold value is the abnormal score value of the training data with the minimum abnormal score in the M training data removed by the automatic encoder in the training process.
Further, the input data set comprises multi-day power consumption data of a plurality of users, and the daily sampling time and the sampling frequency of the original data set and the input data set are the same.
A time sequential autoencoder based power theft detection system, comprising:
the acquisition module is used for acquiring an original data set and an input data set;
the Gaussian module is used for obtaining an optimal Gaussian mixture model with the Gaussian distribution number of K by using a maximum expectation method EM and a Bayesian information criterion BIC based on the original data set;
the training module is used for constructing an automatic encoder based on the long-short term memory artificial neural network LSTM and training by using a training set to obtain model parameters of the automatic encoder;
and the application module is used for carrying out abnormity detection on the input data set based on the optimal Gaussian mixture model and the automatic encoder to obtain a power stealing detection result.
Further, the training of the ith (i is greater than or equal to 1 and less than or equal to K) automatic encoder in the training module specifically includes:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.
A computer storage medium having an executable computer program stored therein, wherein the computer program when executed implements a power theft detection method.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method has the advantages that the Gaussian mixture model is used for clustering data, the electric quantity consumption data with different consumption habits are distinguished, the automatic encoders corresponding to the clustering results are used for carrying out abnormity detection, the electric quantity consumption data without labels can be subjected to electricity stealing detection, the application range is wide, the detection performance is high, and reference is provided for the abnormity detection of the electric quantity consumption data without labels.
(2) By using the automatic encoder model based on the LSTM, the characteristics of time sequence, imbalance and large dimensionality caused by high sampling rate of the electricity consumption data are considered, and a result with higher quality can be effectively obtained.
(3) When the automatic encoder model is trained, the training set is directly cleaned from the label-free data through self-supervision learning according to the preset abnormal proportion, so that data labeling is realized, the learning effect of the automatic encoder is changed, and the applicability is high.
(4) The Gaussian mixture model is used for clustering the input electric quantity consumption data, the electric quantity consumption data with different consumption habits can be distinguished, the automatic encoders are respectively constructed and trained, and the robustness and the learning performance of the automatic encoders are enhanced.
Drawings
FIG. 1 is a flow chart of a method for detecting power theft based on a time sequential automatic encoder;
fig. 2 is a comparison of measurements of the effectiveness of different electricity stealing detection methods.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1:
a power stealing detection method based on a time sequence automatic encoder is disclosed, as shown in figure 1, and comprises a model training stage and a model application stage, wherein the model training stage comprises the following steps:
s1: acquiring multi-day electric quantity consumption data of a plurality of users as an original data set;
s2: establishing a plurality of Gaussian mixture models, determining parameters of each Gaussian mixture model by using a maximum expectation method EM (expectation-maximization) respectively in combination with an original data set, and obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, wherein the optimal Gaussian mixture model comprises K Gaussian distributions;
in this embodiment, step S2 specifically includes:
s21: sequentially establishing Gaussian mixture models with the number of Gaussian distributions of 1, 2 and … L, wherein L is the maximum number of the Gaussian distributions in the Gaussian mixture models;
s22: for each Gaussian mixture model, inputting the electric quantity consumption data in the original data set into the Gaussian mixture model to obtain a probability set corresponding to each electric quantity consumption data, wherein the probability set comprises the probability that the electric quantity consumption data belong to each Gaussian distribution in the Gaussian mixture model; based on the electric quantity consumption data and the corresponding probability set thereof, obtaining a log-likelihood estimation result by using a maximum Expectation (EM), adjusting parameters of the Gaussian mixture model according to a log-likelihood estimation function, re-determining the probability set corresponding to each electric quantity consumption data, re-obtaining the log-likelihood estimation result, repeating the steps until convergence, and obtaining the Gaussian mixture model with the determined parameters; this step is repeated until L gaussian mixture models are determined.
S23: and respectively calculating the BIC value of each Gaussian mixture model by using Bayesian information criterion BIC, and selecting the Gaussian mixture model with the minimum BIC value as an optimal Gaussian mixture model, wherein the optimal Gaussian mixture model comprises K Gaussian distributions.
S3: dividing the original data set into K training sets based on the clustering result of the optimal Gaussian mixture model on the original data set;
s4: constructing K automatic encoders based on the long-term and short-term memory artificial neural network LSTM, and respectively training by using a training set to obtain the automatic encoders corresponding to the Gaussian distributions in the optimal Gaussian mixture model;
the training of the ith (i is more than or equal to 1 and less than or equal to K) automatic encoder specifically comprises the following steps:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the remaining training data in the training sets, completing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents preset abnormal proportion, such as 5%, 10% and the like.
The error vector is:
Figure BDA0003032975210000061
wherein e is(j)Error vector, x, for the jth training data(j)For the jth training data in the training set,
Figure BDA0003032975210000062
is x(j)And (4) inputting the output obtained after the automatic encoder is input.
The calculation formula of the abnormal score is as follows:
a(j)=(e(j)-μ)Tσ-2(e(j)-μ)
wherein, a(j)For the jth training data x(j)Abnormal score of, mu and sigma2Gaussian distribution corresponding to the ith training set
Figure BDA0003032975210000063
Mean and variance of.
The model application phase is as follows:
acquiring the electric quantity consumption data of a user in the original data set as an input data set, clustering the input data set according to the optimal Gaussian mixture model, and performing abnormity detection on the electric quantity consumption data in the clustering result by using an automatic encoder corresponding to the clustering result to obtain an electricity stealing detection result.
And acquiring an automatic encoder corresponding to the clustering result, inputting the electric quantity consumption data into the automatic encoder, calculating the abnormal score of the electric quantity consumption data, and taking the electric quantity consumption data with the abnormal score larger than a score threshold value as abnormal data, wherein the score threshold value is the abnormal score value of the training data with the minimum abnormal score in the M training data removed by the automatic encoder in the training process.
The input data set comprises multi-day electricity consumption data of a plurality of users, and the daily sampling time and the sampling frequency of the original data set and the input data set are the same.
A time sequential autoencoder based power theft detection system, comprising:
the acquisition module is used for acquiring an original data set and an input data set;
the Gaussian module is used for obtaining an optimal Gaussian mixture model with the Gaussian distribution number of K by using a maximum expectation method EM and a Bayesian information criterion BIC based on the original data set;
the training module is used for constructing an automatic encoder based on the long-short term memory artificial neural network LSTM and training by using a training set to obtain model parameters of the automatic encoder;
and the application module is used for carrying out abnormity detection on the input data set based on the optimal Gaussian mixture model and the automatic encoder to obtain a power stealing detection result.
The training module for training the ith (i is more than or equal to 1 and less than or equal to K) automatic encoder specifically comprises the following steps:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.
A computer storage medium having stored therein an executable computer program that, when executed, implements a power theft detection method.
In this embodiment, the original data set is the electricity consumption data of 4225 residential users of the ireland power company for 535 consecutive days, and the samples are taken every half hour, so that 48 sampling points can be obtained by one user a day.
The method comprises the steps of constructing a plurality of Gaussian mixture models, solving the Gaussian mixture models based on an original data set by using an EM algorithm, obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, clustering data in the original data set by using the optimal Gaussian mixture model according to user habits to obtain K clustering results, and dividing the original data set into K training sets according to the clustering results.
And respectively constructing an automatic encoder for each Gaussian distribution of the optimal Gaussian mixture model, and training the automatic encoder based on a training set. The detection range is wider by using the long-short term memory network LSTM in the automatic encoder in consideration of the time sequence of the power consumption data.
Because the labels of the training data in the training set are unknown, assuming that abnormal data exist in the training set, after the training set is used for training the automatic encoder, the training data are sequentially input into the automatic encoder to obtain output, and the abnormal score of each training data is calculated, in the embodiment, the abnormal proportion (the proportion of electricity stealing behavior set according to experience) is set to be 10%, after the training data are sorted according to the abnormal scores from high to low, the first 10% of the training data are considered to be abnormal data, corresponding to the electricity stealing behavior, the abnormal data are removed from the training set, and the automatic encoder is trained again, so that the final automatic encoder is obtained. Therefore, through self-supervision learning, the training set is directly cleaned from the label-free data, so that data labeling is realized, the learning effect of the automatic encoder is changed, and the applicability is high.
After the automatic encoders corresponding to the optimal Gaussian mixture model and the Gaussian distributions of the optimal Gaussian mixture model are obtained based on the original data set, the method can be applied to actual electricity stealing detection.
The power consumption data of 4225 residential users of Ireland electric company in the original data set in another period is also obtained as the input data set, for example, the original data set is the power consumption data of the users continuously 535 days before 1 month and 1 day of 2020, and the input data set is the power consumption data of the users continuously 30 days after 1 month and 1 day of 2020.
And B, clustering the input data set by using an optimal Gaussian mixture model to obtain K clustering results, respectively inputting the electricity consumption data in the clustering results into corresponding automatic encoders, calculating the abnormal score of each electricity consumption data, and taking the electricity consumption data with the abnormal score larger than a score threshold as abnormal data, wherein the score threshold is the abnormal score value of the training data with the minimum abnormal score in the M training data removed in the step A3. And outputting the user and the date corresponding to the abnormal data, and considering that the user conducts electricity stealing behavior on the date.
In other embodiments, the number of users and the collection date of the power consumption data may also be changed, but in order to ensure the detection accuracy, the user should be the user in the original data set, and after clustering the input data set by using the optimal gaussian mixture model, at least 1 clustering result can be obtained, so that the automatic encoder corresponding to the clustering result can be used to perform anomaly detection on the power consumption data in the same clustering result.
When evaluating the electricity stealing detection algorithm, a True Positive Rate (TPR) and a False Positive Rate (FPR) are selected as evaluation criteria, and the optimal detection algorithm is that the TPR is as high as possible and the FPR is as low as possible. As shown in FIG. 2, compared with an isolated forest iForest, a Robust variance estimation Robust Covariance, a local outlier LOF, a vector machine One-class SVM, a depth automatic coding Gaussian mixture model DAGMM and a DAGMM using an LSTM layer, the difference value of TPR and FPR of the method provided by the invention is obvious, the TPR is higher than that of other methods, the Area (AUC) under a characteristic curve (ROC) of a subject is also obviously larger than that of other methods, and both the One-class SVM and the DAGMM show the problem of poor data adaptability. Therefore, the method provided by the invention is ideal in the aspect of detection accuracy.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. The power stealing detection method based on the time sequence automatic encoder is characterized by comprising a model training stage and a model application stage, wherein the model training stage comprises the following steps:
s1: acquiring multi-day electric quantity consumption data of a plurality of users as an original data set;
s2: establishing a plurality of Gaussian mixture models, determining parameters of each Gaussian mixture model by using a maximum expectation method EM (expectation-maximization) respectively in combination with an original data set, and obtaining an optimal Gaussian mixture model by using Bayesian information criterion BIC, wherein the optimal Gaussian mixture model comprises K Gaussian distributions;
s3: dividing the original data set into K training sets based on the clustering result of the optimal Gaussian mixture model on the original data set;
s4: constructing K automatic encoders based on the long-term and short-term memory artificial neural network LSTM, and respectively training by using a training set to obtain the automatic encoders corresponding to the Gaussian distributions in the optimal Gaussian mixture model;
the model application phase is as follows:
acquiring the electric quantity consumption data of a user in the original data set as an input data set, clustering the input data set according to the optimal Gaussian mixture model, and performing abnormity detection on the electric quantity consumption data in the clustering result by using an automatic encoder corresponding to the clustering result to obtain an electricity stealing detection result.
2. The power stealing detection method based on the time sequence automatic encoder as claimed in claim 1, wherein the step S2 is specifically:
s21: sequentially establishing Gaussian mixture models with the number of Gaussian distributions of 1, 2 and … L, wherein L is the maximum number of the Gaussian distributions in the Gaussian mixture models;
s22: for each Gaussian mixture model, respectively determining parameters of the Gaussian mixture model by using a maximum expectation method EM;
s23: and respectively calculating the BIC value of each Gaussian mixture model by using Bayesian information criterion BIC, and selecting the Gaussian mixture model with the minimum BIC value as an optimal Gaussian mixture model, wherein the optimal Gaussian mixture model comprises K Gaussian distributions.
3. The method as claimed in claim 1, wherein the training of the ith (1 ≤ i ≤ K) automatic encoder in step S4 comprises:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.
4. The method of claim 3, wherein the error vector is:
Figure FDA0003032975200000021
wherein e is(j)Error vector, x, for the jth training data(j)For the jth training data in the training set,
Figure FDA0003032975200000022
is x(j)Input automationThe output obtained after the encoder.
5. The method of claim 4, wherein the anomaly score is calculated by the formula:
a(j)=(e(j)-μ)Tσ- 2(e(j)-μ)
wherein, a(j)For the jth training data x(j)Abnormal score of, mu and sigma2Gaussian distribution corresponding to the ith training set
Figure FDA0003032975200000023
Mean and variance of.
6. The power stealing detection method based on the time-series automatic encoder as claimed in claim 3, wherein in the model application stage, the abnormal detection of the power consumption data belonging to the same clustering result is specifically as follows:
and acquiring an automatic encoder corresponding to the clustering result, inputting the electric quantity consumption data into the automatic encoder, calculating the abnormal score of the electric quantity consumption data, and taking the electric quantity consumption data with the abnormal score larger than a score threshold value as abnormal data, wherein the score threshold value is the abnormal score value of the training data with the minimum abnormal score in the M training data removed by the automatic encoder in the training process.
7. The method of claim 1, wherein the input data set comprises multi-day power consumption data of a plurality of users, and the raw data set and the input data set have the same daily sampling time and sampling frequency.
8. A time-series automatic encoder-based electricity stealing detection system, which is based on the electricity stealing detection method according to any one of claims 1 to 7, and comprises the following steps:
the acquisition module is used for acquiring an original data set and an input data set;
the Gaussian module is used for obtaining an optimal Gaussian mixture model with the Gaussian distribution number of K by using a maximum expectation method EM and a Bayesian information criterion BIC based on the original data set;
the training module is used for constructing an automatic encoder based on the long-short term memory artificial neural network LSTM and training by using a training set to obtain model parameters of the automatic encoder;
and the application module is used for carrying out abnormity detection on the input data set based on the optimal Gaussian mixture model and the automatic encoder to obtain a power stealing detection result.
9. The system according to claim 8, wherein the training module for training the ith (1 ≤ i ≤ K) automatic encoder specifically comprises:
a1: acquiring training data in an ith training set, and training an ith encoder by using the training data in the training set;
a2: sequentially inputting training data in a training set into an ith automatic encoder to obtain an error vector, a mean value and a variance of each training data, and respectively calculating the abnormal score of each training data based on the error vector, the mean value and the variance;
a3: and removing M training data from the ith training set according to the abnormal scores, training the ith encoder by using the residual training data in the training sets, finishing the training of the automatic encoder, and recording model parameters of the automatic encoder, wherein M is NxP, N represents the number of the training data in the ith training set, and P represents a preset abnormal proportion.
10. A computer storage medium having stored thereon an executable computer program, wherein the computer program, when executed, implements a power theft detection method as claimed in any one of claims 1 to 7.
CN202110435767.7A 2021-04-22 2021-04-22 Electricity stealing detection method, system and medium based on time sequence automatic encoder Pending CN113327008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110435767.7A CN113327008A (en) 2021-04-22 2021-04-22 Electricity stealing detection method, system and medium based on time sequence automatic encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110435767.7A CN113327008A (en) 2021-04-22 2021-04-22 Electricity stealing detection method, system and medium based on time sequence automatic encoder

Publications (1)

Publication Number Publication Date
CN113327008A true CN113327008A (en) 2021-08-31

Family

ID=77415037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110435767.7A Pending CN113327008A (en) 2021-04-22 2021-04-22 Electricity stealing detection method, system and medium based on time sequence automatic encoder

Country Status (1)

Country Link
CN (1) CN113327008A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442107A (en) * 2022-08-31 2022-12-06 哈尔滨工业大学(威海) Communication data anomaly detection method based on Gaussian mixture model
CN117495109A (en) * 2023-12-29 2024-02-02 国网山东省电力公司禹城市供电公司 Electricity stealing user identification system based on deep well network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778841A (en) * 2016-11-30 2017-05-31 国网上海市电力公司 The method for building up of abnormal electricity consumption detection model
CN112379269A (en) * 2020-10-14 2021-02-19 武汉蔚来能源有限公司 Battery abnormity detection model training and detection method and device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778841A (en) * 2016-11-30 2017-05-31 国网上海市电力公司 The method for building up of abnormal electricity consumption detection model
CN112379269A (en) * 2020-10-14 2021-02-19 武汉蔚来能源有限公司 Battery abnormity detection model training and detection method and device thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何庆 等: "基于高斯混合模型的最大期望聚类算法研究", 《研究与设计》 *
程金: "基于用电信息采集系统的防窃电方法及应用", 《中国优秀博硕士学位论文全文数据库(硕士) 工程科技Ⅱ辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442107A (en) * 2022-08-31 2022-12-06 哈尔滨工业大学(威海) Communication data anomaly detection method based on Gaussian mixture model
CN117495109A (en) * 2023-12-29 2024-02-02 国网山东省电力公司禹城市供电公司 Electricity stealing user identification system based on deep well network
CN117495109B (en) * 2023-12-29 2024-03-22 国网山东省电力公司禹城市供电公司 Power stealing user identification system based on neural network

Similar Documents

Publication Publication Date Title
CN111612651A (en) Abnormal electric quantity data detection method based on long-term and short-term memory network
Cheng et al. Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting
CN111796957B (en) Transaction abnormal root cause analysis method and system based on application log
CN113327008A (en) Electricity stealing detection method, system and medium based on time sequence automatic encoder
Shehzad et al. A robust hybrid deep learning model for detection of non-technical losses to secure smart grids
CN114239725B (en) Electric larceny detection method for data poisoning attack
CN114676742A (en) Power grid abnormal electricity utilization detection method based on attention mechanism and residual error network
CN115412455A (en) Server multi-performance index abnormity detection method and device based on time sequence
CN112507479B (en) Oil drilling machine health state assessment method based on manifold learning and softmax
CN112213687B (en) Gateway electric energy meter data anomaly detection method and system based on pseudo-anomaly point identification
CN114760098A (en) CNN-GRU-based power grid false data injection detection method and device
Precioso et al. NILM as a regression versus classification problem: the importance of thresholding
CN116451142A (en) Water quality sensor fault detection method based on machine learning algorithm
CN115329839A (en) Electricity stealing user identification and electricity stealing amount prediction method based on convolution self-encoder and improved regression algorithm
CN116304604B (en) Multivariate time series data anomaly detection and model training method and system
CN112926686A (en) BRB (Brillouin bus) and LSTM (least Square) model-based power big data power utilization anomaly detection method and device
CN116662899A (en) Noise-containing data anomaly detection method based on self-adaptive strategy
CN111738348A (en) Power data anomaly detection method and device
CN112561306B (en) Rolling bearing health state evaluation method based on Hankel matrix
CN115184054A (en) Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium
Zhu et al. Auto-starting semisupervised-learning-based identification of synchrophasor data anomalies
CN114676783A (en) Load identification method based on single classification and fuzzy width learning
CN111199014B (en) Time sequence based seq2point NILM method and device
Precioso et al. Thresholding methods in non-intrusive load monitoring to estimate appliance status
CN115293244B (en) Smart grid false data injection attack detection method based on signal processing and data reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210831