CN113095739A - Power grid data anomaly detection method and device - Google Patents

Power grid data anomaly detection method and device Download PDF

Info

Publication number
CN113095739A
CN113095739A CN202110537094.6A CN202110537094A CN113095739A CN 113095739 A CN113095739 A CN 113095739A CN 202110537094 A CN202110537094 A CN 202110537094A CN 113095739 A CN113095739 A CN 113095739A
Authority
CN
China
Prior art keywords
data
electricity consumption
group
model
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110537094.6A
Other languages
Chinese (zh)
Inventor
冯小峰
冯浩洋
郭文翀
卢世祥
李健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Measurement Center of Guangdong Power Grid Co Ltd
Metrology Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Measurement Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Measurement Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202110537094.6A priority Critical patent/CN113095739A/en
Publication of CN113095739A publication Critical patent/CN113095739A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a power grid data anomaly detection method and device, wherein an identification model obtained through machine learning training is used for automatically identifying power consumption data of a user input into the identification model, and the identification model can automatically determine whether the power consumption of the user is in an abnormal state. In the data-oriented method, the method provided by the invention can automatically identify users with abnormal electricity consumption from the huge amount of power grid data, does not need manual identification, and can efficiently provide corresponding data for subsequent further electricity stealing detection, thereby improving the efficiency of electricity stealing detection.

Description

Power grid data anomaly detection method and device
Technical Field
The invention relates to the technical field of power grid electricity stealing detection, in particular to a power grid data abnormity detection method and device.
Background
The electricity stealing means that some power users use illegal means, and the actual used electric quantity is smaller than the measured electric quantity, so that the purpose of paying less or even not paying the electric charge is achieved. Such illegal activities not only seriously undermine the normal utilization of the power, but also cause huge economic losses to the operation of the power system. Meanwhile, the reconnection of unauthorized lines or meters easily causes accidents such as power failure and fire, and poses a serious threat to the safety of the power system. Therefore, there is a need to develop effective techniques to detect power theft to ensure safe and economical operation of power systems.
Power theft detection techniques can be divided into three categories: a network-oriented method, a data-oriented method, and a hybrid method that mixes the first two methods. Network-oriented and hybrid-oriented approaches typically require expertise in the network topology and even the assistance of other hardware devices, which are often expensive and difficult to apply broadly. The data-oriented method only focuses on data provided by the smart meter, and has no requirement on network topology or other equipment, which is beneficial to improving the cost effectiveness of the suspected electricity stealing judgment and detection. Thus, over the years, data-oriented methods have been widely used for power theft detection.
In the data-oriented method, because the data volume of the power grid is huge, in order to improve the efficiency of detecting electricity stealing, a method capable of quickly and efficiently detecting the data abnormality of the power grid is urgently needed.
Disclosure of Invention
The invention aims to solve at least one technical problem in the prior art, and provides a power grid data anomaly detection method and device, which can quickly and efficiently detect anomalies of power consumption data of a power grid user.
In a first aspect, the present invention provides a method for detecting grid data anomaly, where the method includes:
inputting the electricity consumption data of the user into the recognition model;
determining whether a result of abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
As a further refinement, the model is prior to being trained by machine learning using the plurality of sets of data, the method further comprising:
and preprocessing the original data to obtain the multiple groups of data.
As a further improvement, the specific process of preprocessing the original data to obtain the multiple groups of data is as follows:
and carrying out data cleaning on the original data to obtain the multiple groups of data.
As a further improvement, the recognition model is obtained by training a convolutional neural network model by using the plurality of groups of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are connected in sequence;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
As a further improvement, the power consumption data corresponding to each group of data in the multiple groups of data includes a user number of a user, multiple power consumption dates and power consumption data corresponding to the power consumption dates, wherein the power consumption data includes multiple power consumption parameters, and each power consumption parameter is respectively used for representing power consumption corresponding to multiple time periods on the current day;
the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
In a second aspect, the present invention provides a device for detecting grid data abnormality, the device comprising:
the input module is used for inputting the electricity consumption data of the user into the recognition model;
the identification module is used for determining whether a result of an abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
As a further improvement, the apparatus further comprises:
and the preprocessing module is used for preprocessing the original data to obtain the multiple groups of data.
As a further improvement, the preprocessing module is further configured to:
and carrying out data cleaning on the original data to obtain the multiple groups of data.
As a further improvement, the recognition model is obtained by training a convolutional neural network model by using the plurality of groups of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are connected in sequence;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
As a further improvement, the power consumption data corresponding to each group of data in the multiple groups of data includes a user number of a user, multiple power consumption dates and power consumption data corresponding to the power consumption dates, wherein the power consumption data includes multiple power consumption parameters, and each power consumption parameter is respectively used for representing power consumption corresponding to multiple time periods on the current day;
the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
In a third aspect, the present invention provides an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the grid data anomaly detection method according to any one of the embodiments of the first aspect of the present invention when executing the program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer to execute the power grid data anomaly detection method according to any one of the embodiments of the first aspect of the present invention.
Compared with the prior art, the method and the device for detecting the power grid data abnormity provided by the invention at least have the following beneficial effects:
the power grid data anomaly detection method provided by the invention uses the recognition model obtained through machine learning training to automatically recognize the power consumption data of the user input into the recognition model, and the recognition model can automatically determine whether the power consumption of the user has an abnormal state. In the data-oriented method, the method provided by the invention can automatically identify users with abnormal electricity consumption from the huge amount of power grid data, does not need manual identification, and can efficiently provide corresponding data for subsequent further electricity stealing detection, thereby improving the efficiency of electricity stealing detection.
Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The invention is further described below with reference to the accompanying drawings and examples;
fig. 1 is a schematic flow chart of a power grid data anomaly detection method in an embodiment.
Fig. 2 is a block diagram of a power grid data anomaly detection device according to an embodiment.
FIG. 3 is a block diagram of a computer device in one embodiment.
Detailed Description
Reference will now be made in detail to the present preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
Hereinafter, a method for detecting grid data anomaly according to an embodiment of the present invention will be described in detail and illustrated with several specific embodiments.
As shown in fig. 1, in one embodiment, a grid data anomaly detection method is provided. The embodiment is mainly illustrated by applying the method to computer equipment. The computer device may specifically be the structure shown in fig. 3.
Referring to fig. 1, the method for detecting grid data abnormality specifically includes steps S102 to S106, and each step specifically includes the following steps:
step S102: and preprocessing the original data to obtain the multiple groups of data.
After the raw data of electricity stealing and non-electricity stealing users in a certain area are collected, the data need to be preprocessed, and the specific process of preprocessing the raw data to obtain the multiple groups of data is as follows: and carrying out data cleaning on the original data to obtain the multiple groups of data.
Specifically, the raw data refers to electric quantity data directly acquired from a measuring end, the resident side generally comprises the electric quantity/kilowatt-hour acquired at each time interval, and the industrial side generally comprises the data of the electric quantity, voltage, current, phase angle and the like every 15 minutes; the collecting system of each area is different, some areas collect electricity consumption every 15 minutes, some collect electricity consumption every half an hour, and some collect total electricity consumption only one day.
The collected original data can be preprocessed by data cleaning and the like, and then a plurality of groups of data which are in line with the recognition model and used for machine learning training can be obtained. Data cleansing is used for removing redundant and erroneous data and supplementing missing data to obtain an available data set (i.e. compared with an original data set, the cleansed data set is supplemented with missing data, duplicate data is removed, and erroneous data is deleted to obtain the multiple sets of data for training). The method for supplementing missing values in data cleansing is shown in the following formula:
Figure BDA0003068680740000051
the method for removing the error value in the data cleaning is as follows:
Figure BDA0003068680740000052
wherein x isdIs xd,tOne vector of (2), xd,tIs consumption data (i.e. raw data) of the power consumer from t to d days,
Figure BDA0003068680740000053
is a data set formed by a plurality of groups of data after data cleaning, NaN represents data without data or without practical significance, avg (x)d) And std (x)d) Respectively represent a vector xdAverage and standard deviation values.
Since power consumption data loss, duplication and errors may occur during data collection, to avoid the adverse effects of fault data on power theft detection, the present method provides a power data preprocessing method to recover the lost and erroneous data. The following equation represents an interpolation method for restoring task data.
Figure BDA0003068680740000061
In addition, the method of recovering erroneous data using the san xi gamma rule of thumb is as follows:
Figure BDA0003068680740000062
wherein x isdIs xd,tAvg (x) and std (x) represent the mean and standard deviation values, respectively, of vector x.
Step S104: and inputting the electricity consumption data of the user into the recognition model.
After the preprocessing, the lost and wrong data are recovered to obtain the multiple groups of data for training, so that the accuracy of the training of the recognition model can be improved.
Step S106: determining whether a result of abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
It is understood that the machine learning model used for identifying the model may be an already mature machine learning model widely used in the prior art, such as a multi-layer perceptron, a convolutional neural network, a residual puncturing network, and the like, and is not limited herein. It can be determined that, by the data preprocessing method provided in this embodiment, the plurality of sets of data used for training can be obtained after recovering lost and erroneous data, and the accuracy of the recognition model training can be improved. Furthermore, in the data-oriented method, the method provided by the invention can automatically identify users with abnormal electricity consumption from the huge amount of power grid data, does not need manual identification, and can efficiently provide corresponding data for subsequent further electricity stealing detection, thereby improving the efficiency of electricity stealing detection.
In one embodiment, the recognition model is obtained by training a convolutional neural network model by using the plurality of sets of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are connected in sequence;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
In this example, the convolutional neural network is implemented using a Text convolutional neural network (i.e., Text-CNN). Specifically, the text neural network structure is mainly composed of a convolutional layer, a pooling layer and a full-link layer. The function of each layer will be described in detail below.
And (3) rolling layers:
the proposed model has a plurality of convolutional layers. Each convolutional layer has H convolutional kernels of different sizes. To ensure the efficiency and effectiveness of two-dimensional time series classification, the height of the kernel is the same as the number of data of one day. For dimension HiConvolution kernel of, DuAnd (F, T) represents the u-th data sample, wherein F represents the data scale of the user for one day, and T represents the number of days for which data are continuously acquired. Corresponding convolution kernel weights
Figure BDA0003068680740000071
Is used to extract features from the input data, where K is the length of the convolution kernel, F is the width of the convolution kernel, and is synonymous with the data scale F for one day of the data set,
Figure BDA0003068680740000072
a matrix of convolution kernels is represented that represents a matrix of convolution kernels,
Figure BDA0003068680740000073
representing the convolution kernel weights. For example, feature maps
Figure BDA0003068680740000074
The calculation method comprises the following steps:
Figure BDA0003068680740000075
here, it is a convolution operation.
Figure BDA0003068680740000076
Is an offset term, fa(. cndot.) is a nonlinear activation function. If no function is activated, the output of the next layer is a linear function of the input of the previous layer. It is easy to prove that the output is a linear combination of the inputs, regardless of how many convolutional layers are present, which means that the network has no hidden layers. Thus, the activation function may improve the efficiency of the neural network. HiThe convolution kernel has C number
Figure BDA0003068680740000077
The C convolution kernels produce C feature maps as follows
Figure BDA0003068680740000078
After the first convolution, HiThe feature map corresponding to the convolution kernel of size is denoted as Di(N, C, T-K +1), wherein N represents the number of samples, C represents the number of convolution kernels, T represents the duration, and K represents the length of the convolution kernels.
In order to extract temporal features and compress the amount of data, the feature map of the first convolution layer should be convolved multiple times. Therefore, there are multiple convolutional layers in the proposed neural network. It is noted that the convolution kernel size of the previous layer is not necessarily equal to the convolution kernel size of the next layer. For example, DiHas a convolution kernel size of H in the upper layeri1In the next layer is Hi2,Hi1And Hi2Are independent of each other. After passing through these convolution layers, the convolution kernel size is { H }i1,Hi2,…,HiMThe characteristic diagram of (c) can be expressed as Di(N,C,T-K1-K2-…-KM+ M), wherein KMIs the core length of convolutional layer M.
A pooling layer:
after many times of convolution operation, the data is sent to the pooling layer. The max pooling layer is used herein. In the max pooling layer, only the largest extracted feature value is retained, while all other feature values will be discarded. The largest pooling layer may extract the strongest functions and discard weaker functions. After pooling, the output can be represented as Di(N,C,1)。
Full connection layer:
in a fully connected layer, the input is a stack of pooled layer outputs. Then, we use two classifications (Softmax activation function) to compute a classification result consisting of two probabilities. When the probability of electricity stealing is greater than the normal probability, the input data is marked as electricity stealing. The final output of the entire model is represented as:
Figure BDA0003068680740000081
wherein f issoftmax[·]Show that
Figure BDA0003068680740000082
The feature map of (a) is subjected to a function of the full concatenation and classification operations, N denotes the number of samples, C denotes the number of feature maps generated by the convolution kernel, and D (N,2,1) denotes the result of the classification.
In one example, aiming at the characteristics of power consumption data of power grid users, the invention improves the structure of a convolution kernel, specifically, the power consumption data corresponding to each group of data in the multiple groups of data comprises a user number of a user, multiple power consumption dates and power consumption data corresponding to each power consumption date, wherein the power consumption data comprises multiple power consumption parameters, and each power consumption parameter is respectively used for representing the power consumption corresponding to multiple time periods on the day; the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
In this example, features are extracted using two convolutional layers, each having a plurality of convolutional kernels of different sizes. In the text convolution neural network, the height of a convolution kernel is the same as the quantity of data in one day in consideration of the characteristics of power consumption data of power grid users. For example, the dimension (i.e., the number) of the electricity consumption parameters in the electricity consumption data is 48, that is, the electricity consumption parameters of the user are collected every 15 minutes in a day, and at this time, the height of the convolution kernel is also 48, so the height of the convolution kernel is the same as the dimension of the electricity consumption parameters in the electricity consumption data. At the time of early characterization, the convolution kernels are 2, 3, 5, and 7 wide. A convolution kernel of width 2 or 3 may capture features of adjacent dates. Kernels of widths 5 and 7 may capture features from the periodicity of the workday and week, respectively. Therefore, in this example, according to the natural law presented by the power consumption data in the time period, the structure (i.e., the height and the width) of the convolution kernel is appropriately adjusted, so that the convolution kernel can capture the objective law reflected by the power consumption data of the user in the time period, reflect the objective law of the operation of the power grid, and enable the monitoring of the abnormal data to be more precise and accurate. Furthermore, to reduce the risk of overfitting, the loss rate of the proposed convolutional neural network is set to 0.4.
There are various methods for evaluating whether a model is excellent, and several commonly used indexes are accuracy, precision, recall, F1 value, and the like. The emphasis on evaluating these indicators for a model is different, and for different models, a certain indicator may be more biased. That is, in a particular situation, we are concerned with several of the indicators, and not with the values of others. Herein, the evaluation indexes used are four indexes of accuracy, precision, recall, and F1 value.
It can be understood that, in an example, the abnormal users judged by the trained recognition model can be directly used as electricity stealing users primarily, and the reasons for the abnormal users are emphatically observed in the later stage. Specifically, in the present invention, the above identification model is used for analysis, which is a two-class model, and all the electricity consumers are classified into electricity stealing and electricity non-stealing, where the emphasis is placed on the point that we are more concerned about the electricity stealing consumers, so the electricity stealing consumers are marked as 1 and the electricity non-stealing consumers are marked as 0.
TABLE 1 confusion matrix in the present method
Figure BDA0003068680740000091
From the confusion matrix shown in Table 1, four values are obtained, TP, TN, FP, FN. TP indicates that the user is true and the prediction is true, namely the user is actually stolen, and the prediction result also shows that the user is stolen, which indicates that the user is caught. TN indicates that it is actually a counter-example, and the predicted result is also a counter-example, i.e. an average user who is not actually stealing electricity, and the predicted result is also shown to be non-stealing electricity, indicating no unnecessary intervention in the normal user. The FP shows that the situation is actually a counter example, the prediction result is a positive example, namely that the situation is not electricity stealing actually, but the model mistakenly recognizes that the user is electricity stealing, which shows that the model prediction is wrong, and troubles are brought to normal users. FN indicates that the case is actually a positive case, the prediction result is a negative case, namely that the electricity is actually stolen, and the prediction result shows that the electricity is not stolen, which indicates that the model is wrong and a user who steals electricity is missed. According to the four values of TP, TN, FP and FN, the accuracy, precision, recall and F1 can be calculated.
The Accuracy (AR) is the proportion of the electricity stealing users and the electricity non-stealing users with correct model classification to all the real users, and is the most intuitive and most common judgment standard for measuring the model classification effect. In the present invention, accuracy represents the proportion of all samples that the model judges to be correct (including electricity stealing and electricity non-stealing). The formula is as follows:
Figure BDA0003068680740000101
but actually, in the process of electricity stealing detection, it is found that almost 90% of users are non-electricity stealing users and only 10% of users steal electricity, so that if the accuracy of the model is high, the model is likely to judge all the users as non-electricity stealing users, and therefore the index of the accuracy of single watching is incomplete.
The Precision Ratio (PR) is the number of samples that the model judges to be true and actually are also true, and accounts for the proportion of all the models that the model judges to be true, and it means herein that the user that the model judges to be electricity stealing and actually steals electricity, accounts for the proportion of all the models that the model judges to be electricity stealing. The formula is as follows:
Figure BDA0003068680740000102
the Recall Ratio (RR) refers to the proportion of samples identified as true examples by the model among all actual samples, and in this document, means that the model judges the proportion of users who steal electricity among all actual electricity stealing users, that is, the proportion of electricity stealing users caught by the model among all electricity stealing users. The formula is as follows:
Figure BDA0003068680740000103
the F value is a comprehensive index representing the classification effect and is a harmonic mean value of the precision rate and the recall rate. The formula is as follows:
Figure BDA0003068680740000104
in particular, when the parameter is the most representative F1 value, the calculation formula is as follows:
Figure BDA0003068680740000105
the F1 value combines the results of the precision rate and the recall rate and comprehensively reflects the classification level of the model, and the larger the F1 value is, the better the classification effect of the model is.
The invention has wide application range and is suitable for electric power metering systems in different areas. The invention can accurately judge whether the power consumer steals electricity according to the metering data of the power consumer ammeter under the condition of not knowing the topological structure and the parameters of the power grid, and the electricity stealing detection precision is higher.
In order to facilitate understanding of the beneficial effects of the present invention, the recognition model provided by the present invention is tested in combination with specific data. The specific process is as follows:
the sample data of Ireland is a public data set of the power industry, the data set records 535 days of power consumption of 5000 users, 48 times of power consumption (the power consumption is collected every half hour) are recorded every day, in the example, the actual data of ordinary resident users of 900 users and the data of electricity stealing users of 300 users are taken in the experiment, and the ratio is 1: 3. The data of 21 consecutive days of each user are taken as samples, 48 characteristic values are provided every day (the characteristic values refer to data capable of showing data characteristics and represent electricity consumption/kilowatt hour, and because the electricity consumption is collected by the electricity acquisition system in Ireland every half hour, 48 electricity consumptions are collected in 24 hours a day) (the electricity consumption information acquisition system in Ireland collects the electricity consumptions every half hour). The data set still contains three dimensions, which are the house number, date, electricity usage characteristic (i.e., 48 electricity usage parameters, where the height of the convolution kernel is the same as the dimension, i.e., height is 48).
The data set will be divided into a training set and a test set based on the actual case of the Ireland low voltage users. The training set is used for training the recognition model, and the testing set is used for verifying the recognition model. The input of the convolutional neural network is set as a data set which contains information of three dimensions, namely 48 characteristics of the number of users, the date and the electricity consumption. 70% of the data set was used for training the model and 30% was used for testing the model. The effect of the model was tested on the training and test sets, respectively, and the results are shown in the table below.
Table 2 ireland low voltage user prediction results schematic
Figure BDA0003068680740000111
The results in table 2 show that the prediction result of the model on the training set is that the overall accuracy rate reaches 100%, which indicates that the overall recognition rate of the model on electricity stealing and non-electricity stealing users is 100%; the accuracy rate is 100%, which represents that 100% of the electricity stealing users caught by the model are actually electricity stealing; the recall rate is 100%, indicating that 100% of all electricity stealing users are caught by the model; the F1 value represents a comprehensive value of the accuracy and the recall rate, the F1 value reaches 100%, and the performance of the model is better.
The prediction result of the identification model on the test set is that the overall accuracy rate reaches 95.1 percent, which indicates that the overall identification rate of the model on electricity stealing users and non-electricity stealing users is 95.1 percent; the accuracy rate is 85.4%, which represents that 85.4% of the electricity stealing users caught by the model are actually electricity stealing; the recall rate is 96.2%, indicating that 96.2% of all electricity stealing users were caught by the model; the F1 value represents a comprehensive value of the accuracy and the recall rate, the F1 value reaches 93.6%, and the performance of the model is better.
The confusion matrix for the test set is:
table 3 ireland low voltage user test set confusion matrix
Figure BDA0003068680740000121
The confusion matrix of table 3 shows that in practice 232 non-electricity-stealing users are successfully identified by the model, 76 electricity-stealing users are successfully caught by the model, 13 non-electricity-stealing users are mistaken by the model as electricity stealing, and 3 electricity-stealing users are not caught by the model.
Secondly, in the operation process, the operation time of the code is not long, because the model adopts one-dimensional convolution, the height of the convolution kernel is automatically adapted to a given data set, and in the operation process, the convolution operation is only along one direction, so that the operation speed is greatly improved, and the operation efficiency is extremely high.
Finally, the greater the depth and width of the neural network, the higher the accuracy of the model. The model adopts a structure of multiple convolution layers and multiple convolution kernels, and the accuracy is high and the performance of the model is good as can be seen from the calculation example.
In another aspect, in one embodiment, there is provided a grid data anomaly detection apparatus, the apparatus comprising:
the input module is used for inputting the electricity consumption data of the user into the recognition model;
the identification module is used for determining whether a result of an abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
As a further improvement, the apparatus further comprises:
and the preprocessing module is used for preprocessing the original data to obtain the multiple groups of data.
As a further improvement, the preprocessing module is further configured to:
and carrying out data cleaning on the original data to obtain the multiple groups of data.
As a further improvement, the recognition model is obtained by training a convolutional neural network model by using the plurality of groups of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are connected in sequence;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
As a further improvement, the power consumption data corresponding to each group of data in the multiple groups of data includes a user number of a user, multiple power consumption dates and power consumption data corresponding to the power consumption dates, wherein the power consumption data includes multiple power consumption parameters, and each power consumption parameter is respectively used for representing power consumption corresponding to multiple time periods on the current day;
the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
It is understood that the device embodiment and the method embodiment of the present invention are based on the same inventive concept, and the description of the transposed embodiment is omitted here.
FIG. 3 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be a terminal or a server. As shown in fig. 3, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program, which, when executed by the processor, may cause the processor to implement the grid data anomaly detection method. The internal memory may also store a computer program, and when the computer program is executed by the processor, the computer program may cause the processor to execute the grid data anomaly detection method. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the grid data anomaly detection apparatus provided by the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 3. The memory of the computer device may store various program modules constituting the grid data abnormality detection apparatus, such as the input module and the identification module shown in fig. 2. The computer program constituted by the program modules causes the processor to execute the steps of the grid data abnormality detection method according to the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 3 may perform the step of inputting the power consumption data of the user into the recognition model through an input module in the grid data abnormality detection apparatus shown in fig. 2; executing a step of determining whether a result of abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model through an identification module; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
In one embodiment, there is provided an electronic device including: the power grid data anomaly detection method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to execute the steps of the power grid data anomaly detection method. Here, the steps of the grid data abnormality detection method may be the steps of the grid data abnormality detection methods of the above embodiments.
In one embodiment, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform the steps of the grid data anomaly detection method described above. Here, the steps of the grid data abnormality detection method may be the steps of the grid data abnormality detection methods of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRA), Rambus Direct RAM (RDRA), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Claims (10)

1. A power grid data anomaly detection method is characterized by comprising the following steps:
inputting the electricity consumption data of the user into the recognition model;
determining whether a result of abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
2. The grid data anomaly detection method according to claim 1, wherein the model is trained by machine learning using the plurality of sets of data, the method further comprising:
and preprocessing the original data to obtain the multiple groups of data.
3. The method for detecting the power grid data abnormality according to claim 2, wherein the specific process of preprocessing the original data to obtain the plurality of groups of data is as follows:
and carrying out data cleaning on the original data to obtain the multiple groups of data.
4. The power grid data anomaly detection method according to claim 1, wherein the identification model is obtained by training a convolutional neural network model by using the plurality of groups of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are sequentially connected;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
5. The method according to claim 4, wherein the electricity consumption data corresponding to each group of data in the plurality of groups of data comprises a user number of a user, a plurality of electricity consumption dates and electricity consumption data corresponding to each electricity consumption date, wherein the electricity consumption data comprises a plurality of electricity consumption parameters, and each electricity consumption parameter is respectively used for representing electricity consumption corresponding to a plurality of time periods on the day;
the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
6. An apparatus for detecting grid data anomalies, the apparatus comprising:
the input module is used for inputting the electricity consumption data of the user into the recognition model;
the identification module is used for determining whether a result of an abnormal power consumption state exists in a user corresponding to the power consumption data according to the identification model; the recognition model is obtained by using multiple groups of data through machine learning training, the multiple groups of data comprise a first class of data and a second class of data, and each group of data in the first class of data comprises: abnormal electricity consumption data and a label for identifying that the user corresponding to the group of data has an abnormal electricity consumption state; each set of data in the second class of data comprises: and the normal electricity consumption data and a label which identifies that the users corresponding to the group of data do not have the abnormal electricity consumption state.
7. The grid data anomaly detection device according to claim 6, characterized in that said device further comprises:
and the preprocessing module is used for preprocessing the original data to obtain the multiple groups of data.
8. The grid data anomaly detection device according to claim 7, wherein the preprocessing module is further configured to:
and carrying out data cleaning on the original data to obtain the multiple groups of data.
9. The power grid data anomaly detection device according to claim 6, wherein the identification model is obtained by training a convolutional neural network model by using the plurality of groups of data, and the convolutional neural network model comprises a convolutional layer, a pooling layer and a full-link layer which are sequentially connected;
the convolutional layer is provided with a plurality of convolution kernels with different sizes, and each convolution kernel in the plurality of convolution kernels is used for extracting features from the plurality of groups of data input to the convolutional layer to obtain a feature map corresponding to each group of data in the plurality of groups of data;
the pooling layer is used for selecting the characteristic diagram with the maximum characteristic value in each group of data from the plurality of characteristic diagrams input to the pooling layer and storing the characteristic diagram into a stack;
the full connection layer is used for classifying the feature maps stored in the stack.
10. The device according to claim 9, wherein the electricity consumption data corresponding to each of the data sets includes a user number, a plurality of electricity consumption dates, and electricity consumption data corresponding to the electricity consumption dates, wherein the electricity consumption data includes a plurality of electricity consumption parameters, and each of the electricity consumption parameters is respectively used for representing electricity consumption corresponding to a plurality of time periods on the day;
the height of the convolution kernel is the same as the number of the power utilization parameters of the power utilization data.
CN202110537094.6A 2021-05-17 2021-05-17 Power grid data anomaly detection method and device Pending CN113095739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110537094.6A CN113095739A (en) 2021-05-17 2021-05-17 Power grid data anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110537094.6A CN113095739A (en) 2021-05-17 2021-05-17 Power grid data anomaly detection method and device

Publications (1)

Publication Number Publication Date
CN113095739A true CN113095739A (en) 2021-07-09

Family

ID=76666016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110537094.6A Pending CN113095739A (en) 2021-05-17 2021-05-17 Power grid data anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN113095739A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516192A (en) * 2021-07-19 2021-10-19 国网北京市电力公司 Method, system, device and storage medium for identifying user electricity consumption transaction
CN115377975A (en) * 2022-10-24 2022-11-22 国网浙江省电力有限公司宁波市北仑区供电公司 Power distribution control method and power distribution control system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036357A (en) * 2014-06-12 2014-09-10 国家电网公司 Analysis method for electricity stealing behavioral mode of electricity utilization of user
US20140285357A1 (en) * 2013-03-15 2014-09-25 Yetu Ag System and method for analyzing the energy consumption of electric loads in a consumer network
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN108764984A (en) * 2018-05-17 2018-11-06 国网冀北电力有限公司电力科学研究院 A kind of power consumer portrait construction method and system based on big data
CN111160791A (en) * 2019-12-31 2020-05-15 国网北京市电力公司 Abnormal user identification method based on GBDT algorithm and factor fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140285357A1 (en) * 2013-03-15 2014-09-25 Yetu Ag System and method for analyzing the energy consumption of electric loads in a consumer network
CN104036357A (en) * 2014-06-12 2014-09-10 国家电网公司 Analysis method for electricity stealing behavioral mode of electricity utilization of user
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN108764984A (en) * 2018-05-17 2018-11-06 国网冀北电力有限公司电力科学研究院 A kind of power consumer portrait construction method and system based on big data
CN111160791A (en) * 2019-12-31 2020-05-15 国网北京市电力公司 Abnormal user identification method based on GBDT algorithm and factor fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516192A (en) * 2021-07-19 2021-10-19 国网北京市电力公司 Method, system, device and storage medium for identifying user electricity consumption transaction
CN115377975A (en) * 2022-10-24 2022-11-22 国网浙江省电力有限公司宁波市北仑区供电公司 Power distribution control method and power distribution control system

Similar Documents

Publication Publication Date Title
CN113095739A (en) Power grid data anomaly detection method and device
CN113485302B (en) Vehicle operation process fault diagnosis method and system based on multivariate time sequence data
CN109101986A (en) Power transmission and transformation equipment state method for detecting abnormality and system based on stack noise reduction self-encoding encoder
CN110441695A (en) A kind of battery pack multiple faults error comprehensive diagnosis method combined based on model and signal processing
CN108802535B (en) Screening method, main interference source identification method and device, server and storage medium
CN109615273B (en) Method and system for evaluating state of electric vehicle charging facility
CN103103570A (en) Aluminum electrolysis cell condition diagnosis method based on principal element similarity measure
CN105626502A (en) Plunger pump health assessment method based on wavelet packet and Laplacian Eigenmap
CN111898644A (en) Intelligent identification method for health state of aerospace liquid engine under fault-free sample
CN111967620A (en) Photovoltaic module diagnosis method, device, equipment and readable storage medium
CN116167010A (en) Rapid identification method for abnormal events of power system with intelligent transfer learning capability
CN115796708A (en) Intelligent quality inspection method, system and medium for big data for engineering construction
CN115660262A (en) Intelligent engineering quality inspection method, system and medium based on database application
CN113721182B (en) Method and system for evaluating reliability of online performance monitoring result of power transformer
CN111551856B (en) Vehicle storage battery state detection method and device, computer equipment and storage medium
CN109215816A (en) Steam generator heat-transfer pipe integrity assessment method, system and terminal device
CN116520068A (en) Diagnostic method, device, equipment and storage medium for electric power data
CN116381493A (en) Battery pack fault detection method, system, electronic equipment and storage medium
CN115047262A (en) General equipment abnormal state identification method based on power quality data
CN115128468A (en) Chemical energy storage battery PHM undervoltage fault prediction method
CN110555016A (en) Multi-metering abnormal event correlation degree analysis method
CN114779099A (en) New energy automobile battery performance analysis monitoring system based on big data
CN116627759B (en) Financial payment equipment circuit safety detection device
CN115343579B (en) Power grid fault analysis method and device and electronic equipment
CN113361823B (en) Fuel cell fault prediction method and system based on prediction data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210709

RJ01 Rejection of invention patent application after publication