CN115329839A

CN115329839A - Electricity stealing user identification and electricity stealing amount prediction method based on convolution self-encoder and improved regression algorithm

Info

Publication number: CN115329839A
Application number: CN202210804044.4A
Authority: CN
Inventors: 林振智; 崔雪原; 刘晟源; 杨莉; 马愿谦; 王韵楚; 章天晗; 陈昌铭; 张智; 邱伟强; 龚贤夫; 孙辉; 彭勃; 李耀东
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-11-11

Abstract

The invention discloses a power stealing user identification and power stealing amount prediction method based on a convolution self-encoder and an improved regression algorithm. The method comprises the following steps: firstly, collected normal user electric quantity data is used as input, a convolution self-encoder abnormity detection model is established, and an abnormal electricity user with a large reconstruction error is identified as an electricity stealing user; and then, respectively extracting electric quantity statistical indexes from electric quantity metering data of electricity stealing periods and non-electricity stealing periods of electricity stealing users, constructing a characteristic vector, inputting the constructed characteristic vector as a Tradaboost algorithm, training and generating an XGboost regression model, and predicting potential electricity stealing quantities of the users. The invention can quickly and accurately learn the normal electricity utilization rule, identify the electricity stealing users with abnormal electricity utilization, and predict the potential electricity stealing amount of each electricity stealing user.

Description

Electricity stealing user identification and electricity stealing amount prediction method based on convolution self-encoder and improved regression algorithm

Technical Field

The invention relates to the field of electricity stealing detection research, in particular to a method for mining electricity utilization characteristics of a user to conduct electricity stealing prevention analysis research by utilizing a machine learning technology.

Background

Electricity stealing users make the ammeter reading be less than actual power consumption through illegal means to reduce the power consumption expense of payment, cause very big harm to the interests of electric wire netting company, except economic loss, electricity stealing action can cause unusual high platform district line loss value, leads to transformer overload and voltage unbalance, influences electric power system's operation safety.

With the increasing demand of electric power, the detection of electricity stealing has become an important measure for guaranteeing the benefits of power supply companies. However, the actual electricity stealing amount of the electricity stealing users is difficult to determine, and reliable troubleshooting evidence is lacked for on-site electricity stealing troubleshooting. The main method for detecting electricity stealing by a power supply company is to check a user ammeter on site, and the method has the advantages of low efficiency, large consumption of manpower and material resources, limited checking range and very small number of captured electricity stealing cases, so that a new electricity stealing detection technology needs to be provided, and the detection efficiency and accuracy are improved. The new generation of artificial intelligence and machine learning technology is widely applied to energy and power systems, on one hand, the power systems evolve towards the direction of an intelligent power grid, the physical structure and the mathematical model are more complex, the quantity of collected data is large, the dimensionality is high, and the machine learning technology needs to be applied to deep excavation of the data, on the other hand, the machine learning technology has strong learning and transferring capability, the dependence degree on the exact mathematical model is low, and the problem of high-dimensional complexity can be effectively solved. The electricity stealing detection method based on the data driving and machine learning technologies utilizes the existing electricity stealing sample records and different detection models trained by electricity using data, has the advantages of flexibility in use, low cost, high accuracy and the like, and becomes the main development direction of electricity stealing detection. However, the current electricity stealing detection model based on the machine learning technology often has the problems that the training samples are unbalanced, the electricity stealing rule is not sufficiently mined, and the like, so that the electricity stealing users are easily judged to be normal users by mistake, and the detection rate of the electricity stealing users is low.

Disclosure of Invention

The invention mainly solves the problems that the identification accuracy of the existing electricity stealing troubleshooting method for electricity stealing users is low, and the electricity stealing quantity of the users cannot be predicted, and provides an electricity stealing user identification and electricity stealing quantity prediction method based on a convolution autoencoder and an improved regression algorithm, which is used for identifying the electricity stealing users and predicting the potential electricity stealing quantity of the electricity stealing users.

In order to solve the problem of the method, the invention adopts the following scheme:

a power stealing user identification and power stealing amount prediction method based on a convolution self-encoder and an improved regression algorithm is characterized by comprising the following steps:

step 1, collecting historical electric quantity data of normal users in a distribution area, and establishing a two-dimensional electric quantity matrix of the users;

step 2, establishing a Convolution Automatic Encoder (CAE) abnormity detection model, calculating the reconstruction error of each user, and identifying the abnormal electricity utilization user with larger reconstruction error as an electricity stealing user;

step 3, electric quantity statistical indexes representing the relevance of the electric quantity stolen by the user are extracted from the electric quantity metering data of the electric stealing time period and the non-electric stealing time period of the electric stealing user respectively, and characteristic vectors are constructed;

and 4, taking the characteristic vector constructed in the step 3 as input, providing Tradaboost training and generating an XGboost regression model, so that the potential electricity stealing amount of each electricity stealing user can be predicted.

In the above technical solution, further, in step 1, the metering electric quantity data of normal users in the distribution area needs to be collected, and the electric meter data collection time of different users should be kept consistent, for example, the collection time span may be set to 1 week, and 1 day is taken as a period; transforming the collected one-dimensional time sequence electric quantity data into a two-dimensional matrix:

in the formula (I), the compound is shown in the specification,

is a two-dimensional matrix of electrical quantities for user i,

is the electric quantity data at the kth moment in the jth power utilization cycle, T ^CAE When it is in the power cycleM is the number of cycles used.

Further, in the step 2, a convolution self-encoder is constructed by adopting a two-dimensional convolution layer, a pooling layer and an upper sampling layer, and then the electricity utilization characteristics of normal users are learned, so that a CAE (computer aided engineering) anomaly detection model of the convolution self-encoder is established; the method specifically comprises the following steps:

the two-dimensional convolution layer divides the input two-dimensional electric quantity matrix into sensing domains with the same size (l x l), then the weighting coefficient and the electric quantity value in each sensing domain are used for product operation, the convolution operation is adopted to extract the characteristics of the two-dimensional electric quantity matrix, and the convolution operation is expressed as

In the formula (I), the compound is shown in the specification,

is the result of convolution operation of the perceptual domain (u, v),

is the weight coefficient corresponding to the electric quantity value in the receptive field,

is a two-dimensional electric quantity matrix

B is the bias term of the convolution calculation, f _ker (. Is) an activation function of the nonlinear transformation;

in the encoding process of the convolution self-encoder, a pooling layer is added behind each two-dimensional convolution layer, and the convolution operation result of the two-dimensional convolution layer is subjected to undersampling; in the decoding process, an upper sampling layer is added behind each two-dimensional convolution layer, the compressed convolution operation result is subjected to oversampling, and finally the size of the characteristic matrix output by the convolution self-encoder is the same as that of the input two-dimensional electric quantity matrix; output feature matrix of CAE

Is shown as

Calculating a reconstruction error δ for each user as

In the formula

Is the output characteristic matrix of the CAE anomaly detection model of the convolution self-encoder corresponding to

The output electric quantity value of (2);

and identifying the abnormal electricity utilization user with larger reconstruction error as an electricity stealing user.

Further, in the step 3, the electric quantity data of the electricity stealing time period and the normal time period of the electricity stealing user i are collected and obtained, and the size of the electric quantity data is respectively established to be 1 × T ^Tr Electric quantity vector of electricity stealing time period

And normal time interval electric quantity vector

Wherein T is ^Tr Is the time length for collecting the electric quantity; extracting electric quantity statistical index representing electric quantity stealing relevance of user i from electric quantity vector and constructing characteristic vector F _i ^Xg Is shown as

F _i ^Xg ＝[F _i ^avg ,F _i ^max ,F _i ^min ,F _i ^null ,F _i ^zero ,F _i ^var ,F _i ^pcc ]

In the formula, F _i ^avg ,F _i ^max ,F _i ^min ,F _i ^null ,F _i ^zero ,F _i ^var The average value, the maximum value, the minimum value, the empty value number, the zero value number and the variance statistical index of the original electric quantity curve are obtained;

and

the functions respectively represent the average value, the maximum value, the minimum value, the empty value number, the zero value number and the variance index vector of the electricity consumption vectors for calculating the electricity consumption of the electricity stealing and the electricity using in normal use;

for calculating

And

inter-Pearson correlation coefficient F _i ^pcc 。

Further, in step 4, the electricity stealing users to be predicted are used as a target training set

The number of users is N ^TT The corresponding electricity stealing amount vector is y ^TT Meanwhile, local historical electricity stealing records are collected as an auxiliary training set

The number of users is N ^AT The corresponding electricity stealing amount vector is y ^AT (ii) a Setting the weight vector alpha of each user in the training set ^t ＝[α ^t (1),α ^t (2),…,α ^t (N ^TT +N ^AT )]；

In the t-th iteration, training sets (F) are combined ^TT +F ^AT ) Label vector (y) ^TT +y ^AT ) And a weight vector alpha ^t For input, an XGboost model Xg (t) is trained, and the electricity stealing quantity vector of a target training set user is predicted

Calculating the prediction error epsilon of the target training set _t

Wherein epsilon ₀ In order to be an error threshold for the stealing power vector,

a decision function for determining whether the error exceeds an error threshold;

then define F separately ^TT And F ^AT Upper training error beta _t And beta ₀ Is composed of

The new round of weight vectors a for all input samples are then updated ^t+1 (i) Is composed of

To the maximum number of iterations N ^Xg And then, stopping iteration and outputting the trained XGboost model Xg (N) ^Xg ) And predicting the potential electricity stealing amount of the electricity stealing user by using the model.

The invention has the beneficial effects that:

the invention provides a power stealing anomaly detection model based on a convolution self-encoder based on the collected user electric quantity data, takes a two-dimensional electric quantity matrix as input, can more fully extract the power utilization characteristics of a user, can more accurately identify the anomaly power utilization rule of the power stealing user according to the reconstruction error magnitude, and more accurately identify the power stealing user; the XGboost regression model is established by adopting a Tradaboost training strategy, the training scale is enlarged by supplementing an auxiliary training set, the training effect is improved, and the prediction accuracy of electricity stealing quantity is improved.

Drawings

FIG. 1 is a flowchart of the power stealing user identification and power stealing prediction based on a convolution self-encoder and an improved regression algorithm according to the present invention;

FIG. 2 is a schematic diagram of a convolutional auto-encoder;

fig. 3 is a user reconstruction error distribution diagram.

Detailed Description

The embodiments of the present invention will be described in detail and fully with reference to the accompanying drawings. It should be apparent that the described embodiments are merely exemplary of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for identifying electricity stealing users and predicting electricity stealing amount based on a convolutional autoencoder and an improved regression algorithm according to the present invention, wherein the method comprises the following steps:

step 1, acquiring historical electric quantity data of normal users in a distribution area, and establishing a two-dimensional electric quantity matrix of the users by taking 1 day as a period and 1 week as a time length;

step 2, establishing a Convolution Automatic Encoder (CAE) abnormity detection model, outputting the reconstruction error of each user, and identifying the abnormal electricity utilization user with larger reconstruction error as an electricity stealing user;

step 3, extracting statistical indexes from the electricity metering data of electricity stealing and non-electricity stealing periods respectively, and representing the relevance between the statistical indexes and the electricity stealing amount of the user;

and step 4, proposing a Tradaboost training strategy, training according to the strategy, generating an XGboost regression model, and predicting the potential electricity stealing amount of each electricity stealing user.

Specifically, in step 1, collecting metering electric quantity data of normal users in a distribution area, wherein the electric meter data collection time of different users is kept consistent, the collected time span is set to be 1 week, and one-dimensional time sequence electric quantity data is converted into a two-dimensional matrix by taking 1 day as a period; in step 2, a convolution self-encoder is constructed by adopting a two-dimensional convolution layer, a pooling layer and an up-sampling layer, then electricity utilization characteristics of normal users are learned, the two-dimensional convolution layer divides an input two-dimensional electric quantity matrix into sensing fields (receiving fields) with the same size, then multiplication is carried out on a weight coefficient and electric quantity values in the sensing fields, characteristics of the two-dimensional electric quantity matrix are extracted by adopting convolution operation, and the convolution operation is expressed as

Wherein

Is the result of convolution operation of the perceptual domain (u, v),

is a weighting coefficient corresponding to the electric quantity value in the receptive field,

is a two-dimensional electric quantity matrix

B is the bias term of the convolution calculation, f _ker (. Cndot.) is the activation function of the nonlinear transformation. In the encoding process of the convolution self-encoder, a pooling layer is added behind each two-dimensional convolution layer, and the convolution operation result of the two-dimensional convolution layer is subjected to undersampling; in the decoding process, an upper sampling layer is added after each two-dimensional convolution layer, the compressed convolution operation result is subjected to oversampling, and finally the feature matrix output by the convolution self-encoder is enabled to be

The size is the same as the size of the input two-dimensional electric quantity matrix. Calculating the reconstruction error delta of each user as

Wherein

The output electric quantity value of (2); in step 3, collecting the electric quantity data of the electricity stealing time period and the normal time period of the electricity stealing user i, and respectively establishing the size of 1 multiplied by T ^Tr Electric quantity vector of electricity stealing period

And normal time interval electric quantity vector

Wherein T is ^Tr Is the time length for collecting the electric quantity; extracting electric quantity statistical indexes representing electric quantity stealing relevance of user i from electric quantity vectors and constructing feature vector F _i ^Xg (ii) a In step 4, the user to be predicted is used as a target training set, the local historical electricity stealing record is collected as an auxiliary training set, the XGboost regression model is iteratively trained by adopting a Tradaboost training strategy, and the model is used for predicting the potential electricity stealing amount of the electricity stealing user.

Fig. 2 shows a convolutional self-encoder constructed by the present invention, in which the input is a two-dimensional electrical quantity matrix of a user, the structural components include a two-dimensional convolutional layer, a pooling layer, and an upsampling layer, the training process includes an encoding process and a decoding process, wherein the encoding process obtains a compressed convolutional operation result, the decoding process restores the compressed result to an original size, and the reconstruction error of the user can be obtained by comparing the input two-dimensional electrical quantity matrix with the output feature matrix.

Taking a low-voltage power distribution system of a certain province in China as an example, the low-voltage power distribution system comprises electricity stealing users recorded by 423 users and unmarked users of 22465 users (most of the unmarked users are normal users after investigation, but a small number of the unmarked electricity stealing users may exist), the collected data comprises the daily electric quantity of all the users in 2019, the electricity stealing investigation date, the normal power supply date after rectification, the electricity stealing quantity recorded by each electricity stealing user and the like, and in addition, the historical electricity stealing records in 2016-2018 are collected and input as an auxiliary training set.

Referring to fig. 3, the reconstruction error distribution of users in the training set and the reconstruction error distribution of users in the test set that are not marked are both small, the reconstruction error distribution of users who steal electricity in the test set is large, and the distribution and the reconstruction error distribution of users in the training set are greatly different, so that it can be seen that normal users and users who steal electricity can be better distinguished according to the size of the reconstruction error. The identification of the electricity stealing users and the prediction result of the electricity stealing amount please refer to table 1. Compared with other existing electricity stealing user identification and electricity stealing amount prediction methods, the method disclosed by the invention has better performances in normal user identification rate, electricity stealing user detection rate and electricity stealing amount prediction error.

TABLE 1 subscriber identification and prediction of electricity stealing

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Claims

1. A power stealing user identification and power stealing amount prediction method based on a convolution self-encoder and an improved regression algorithm is characterized by comprising the following steps:

step 2, establishing a convolution self-encoder CAE abnormity detection model, calculating the reconstruction error of each user, and identifying an abnormal power utilization user with a larger reconstruction error as a power stealing user;

and 4, taking the feature vector constructed in the step 3 as input, training according to a Tradaboost algorithm and generating an XGboost regression model, and predicting the potential electricity stealing amount of each electricity stealing user.

2. The method for identifying electricity stealing users and predicting electricity stealing capacity based on the convolutional auto-encoder and the improved regression algorithm as claimed in claim 1, wherein in the step 1, the two-dimensional electric quantity matrix of the users is established, specifically, the one-dimensional time sequence electric quantity data is transformed into the two-dimensional matrix:

in the formula (I), the compound is shown in the specification,

is a two-dimensional electrical quantity matrix of the user i,

is the electric quantity data at the kth moment in the jth power utilization period, T ^CAE Is the length of time of the power cycle and M is the number of power cycles.

3. The power stealing user identification and power stealing prediction method based on convolutional auto-encoder and improved regression algorithm as claimed in claim 1, wherein:

in the step 2, a convolution self-encoder is constructed by adopting a two-dimensional convolution layer, a pooling layer and an upper sampling layer, and then the electricity utilization characteristics of normal users are learned, so that a CAE (computer aided engineering) anomaly detection model of the convolution self-encoder is established; the method specifically comprises the following steps:

the two-dimensional convolution layer divides the input two-dimensional electric quantity matrix into l × l receptive fields with the same size, then uses the weight coefficient to multiply the electric quantity value in each receptive field, adopts convolution operation to extract the characteristics of the two-dimensional electric quantity matrix, and the convolution operation is expressed as

In the formula (I), the compound is shown in the specification,

is the result of convolution operation of the perceptual domain (u, v),

is a two-dimensional electric quantity matrix

B is the bias term of the convolution calculation, f _ker (. Is an activation function of the nonlinear transformation;

in the encoding process of the convolution self-encoder, each two-dimensional convolution layer is added afterThe pooling layer is used for undersampling the convolution operation result of the two-dimensional convolution layer; in the decoding process, an upper sampling layer is added behind each two-dimensional convolution layer, the compressed convolution operation result is subjected to oversampling, and finally the size of a characteristic matrix output by the convolution self-encoder is the same as that of an input two-dimensional electric quantity matrix; output feature matrix of CAE

Is shown as

Calculating a reconstruction error delta of each user as

In the formula

Is the output characteristic matrix of CAE anomaly detection model of convolutional self-encoder

The output electric quantity value of (2);

4. The method for identifying electricity stealing users and predicting electricity stealing amount based on the convolutional auto-encoder and the improved regression algorithm as claimed in claim 1, wherein the step 3 is specifically as follows:

collecting and obtaining electric quantity data of electricity stealing time period and normal time period of electricity stealing user i, and respectively establishing the size of 1 multiplied by T ^Tr Electric quantity vector of electricity stealing period

And normal time interval electric quantity vector

and

for calculating

And

inter-Pearson correlationCoefficient F _i ^pcc 。

5. The power stealing user identification and power stealing prediction method based on convolutional auto-encoder and improved regression algorithm as claimed in claim 1, wherein:

in step 4, the users who are to be predicted to steal electricity are taken as a target training set

The number of users is N ^TT Corresponding to the vector of stealing power as y ^TT Simultaneously collecting local historical electricity stealing records as an auxiliary training set

The number of users is N ^AT Corresponding to the vector of stealing power as y ^AT (ii) a Setting a weight vector alpha of each user in a training set ^t ＝[α ^t (1),α ^t (2),…,α ^t (N ^TT +N ^AT )]；

In the t-th iteration, the training set (F) is combined ^TT +F ^AT ) Label vector (y) ^TT +y ^AT ) And a weight vector alpha ^t For input, an XGboost model Xg (t) is trained, and the electricity stealing quantity vector of a target training set user is predicted

Calculating the prediction error epsilon of the target training set _t

a decision function for determining whether the error exceeds an error threshold; then define F separately ^TT And F ^AT Upper training error beta _t And beta ₀ Is composed of

Up to the maximum number of iterations N ^Xg And then, stopping iteration and outputting the trained XGboost model Xg (N) ^Xg ) And predicting the potential electricity stealing amount of the electricity stealing users by using the model.