CN116628605A

CN116628605A - Method and device for electricity stealing classification based on ResNet and DSCAttention mechanism

Info

Publication number: CN116628605A
Application number: CN202310615835.7A
Authority: CN
Inventors: 方立雄; 张建文; 刘梦爽; 段志尚; 何哲伟
Original assignee: Marketing Service Center Of State Grid Xinjiang Electric Power Co ltd Capital Intensive Center Metering Center
Current assignee: Marketing Service Center Of State Grid Xinjiang Electric Power Co ltd Capital Intensive Center Metering Center
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-08-22

Abstract

The invention discloses a method for classifying electricity theft based on ResNet and DSCAttention mechanisms, which comprises the following steps: step 1, firstly, carrying out data analysis on the obtained data, obtaining time sequence data characteristics with stronger authenticability through trend season analysis and correlation analysis, and preparing design and construction of a model according to the time sequence data characteristics; step 2, entering data preprocessing, analyzing and processing missing values of data, constructing a mask matrix, normalizing the data by using quantile transformation to reduce the sensitivity of a model to abnormal values, and dividing a training set and a verification set by using a method of splitting the data set in a layering way; and step 3, entering a model building and training stage, and adjusting different super parameters in the model training process to find out the super parameter combination with the best effect on classification evaluation until a final electricity stealing classification model is formed. The invention can classify and identify normal users and users in the electricity larceny time sequence data according to the electricity larceny time sequence data set, and calculate the probability of the users as electricity larceny.

Description

Method and device for electricity stealing classification based on ResNet and DSCAttention mechanism

Technical Field

The invention belongs to the technical field of artificial intelligence and electricity larceny detection, and particularly relates to an electricity larceny classification method and device based on ResNet and DSCAttention mechanisms.

Background

At present, the method related to the identification and detection of the electricity larceny focuses on complex internal features of the electricity larceny time sequence data, and algorithms for using time sequence correlation for time sequence identification and classification are few; in the existing power stealing time sequence correlation learning method, an LSTM neural network is mostly adopted, so that the attention degree of introducing an attention mechanism into a power stealing time sequence recognition classification task is low.

The existing products currently associated with the identification of theft of electricity therefore have the drawbacks of: the prior products mainly pay attention to intrinsic complex features of electricity stealing data, such as time sequence period, trend features and the like, on electricity stealing classification tasks, most of the products adopt convolutional neural network algorithm to extract the intrinsic complex features and use the intrinsic complex features for classification, and little development and insufficient related research are caused in the aspect of learning time sequence relativity by a method of introducing attention mechanism.

Disclosure of Invention

The invention aims to provide a method and a device for electricity larceny classification based on ResNet and DSCAttention mechanisms, which can be used for classifying and identifying normal electricity users and electricity larceny users in electricity larceny time sequence data according to an electricity larceny time sequence data set and calculating the probability of the electricity larceny users.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a method for electricity theft classification based on ResNet and DSCAttention mechanisms comprises the following steps:

step 1, firstly, carrying out data analysis on the obtained data, obtaining time sequence data characteristics with stronger authenticability through trend season analysis and correlation analysis, and preparing design and construction of a model according to the time sequence data characteristics;

step 2, entering a data preprocessing link, analyzing and processing missing values of data, constructing a mask matrix, normalizing the data by using quantile transformation to reduce the sensitivity of a model to abnormal values, and dividing a training set and a verification set by using a method of splitting the data set in a layering way;

and step 3, after the data are processed, entering a construction and training stage of a deep learning model, and adjusting different super parameters in the model training process to find the super parameter combination with the best effect on classification evaluation until a final electricity stealing classification model is formed.

Furthermore, the electricity stealing classification model takes a ResNet18 network as a basic network structure, and introduces a deep separable convolution enhanced self-attention mechanism layer on the basis, so that the model can extract the inherent complex characteristics of electricity stealing time sequences and can also consider the correlation between the time sequences.

Further, the number of output channels of the second and third convolution layers of the ResNet18 network is modified, and a channel-by-channel convolution and channel-by-channel self-attention mechanism in a depth separable convolution structure is introduced; and then replacing the residual network structure left by the ResNet18 with a point-by-point convolution layer and a two-layer fully-connected neural network, wherein the point-by-point convolution layer and the upper-layer channel-by-channel convolution form a complete depth separable convolution structure, and the two-layer fully-connected neural network is used as a classifier.

Further, in the step 1, the time series is decomposed by using an additive model, and the intrinsic features of the time series are decomposed into trend term and season term features, so that the intrinsic features of the time series can be easily analyzed; the timing additive model is shown in formula (1.5):

T _t ＝Trend _t +Seasonal _t +Resid _t (1.5)

in the formula (1.5), T _t Is the original time sequence, trend _t Is a trend term of a time series additive model, and is a Seaseal _t As season term, resin _t Is the residual term of the additive model of the time series.

Further, the timeTrend term Trend of sequence additive model _t The calculation of (2) is shown in the formula (1.6):

in the formula (1.6), y _t+i A value at time t+i in the time series T, m=2k+1 representing an m-order moving average;

converting the one-dimensional time sequence data into two-dimensional space time sequence data by adopting a dividing method taking 7 days as a period, wherein the order or period of a moving average is 7, namely k=3; according to the definition of the (1.5) formula on the additive model, the sum of the seasonal term and the residual term is the time sequence minus the trend term;

defining the set of t+i as { t+1, t+2, & gt, t+m }, the seasonal term calculation formula of the time series additive model is as shown in (1.7).

Further, in the step 1, time series data of a cartesian coordinate system is converted into a polar coordinate system through a gram angle field, and the correlation of analysis time series among different times is calculated by calculating included angles among data features in each time in the polar coordinate system; the calculation of GAF is shown in formula (1.8):

in the formula (1.8), G is a square matrix of size n×n, cos (φ) _n +φ _n ) The equation of the form is the special inner product defined by GAF, let the one-dimensional vector needed to calculate GAF be x= { X ₁ ,x ₂ ,...,x _n The length of vector X is just n, phi in the formula ₁ Is arccos (x) ₁ )。

Further, cos (phi) _i +φ _j ) The specific calculation process of (2) is shown in the formula (1.9):

cos(φ _i +φ _j )＝cos(arccos(x _i )+arccos(x _j )) (1.9)

in the formula (1.9), i, j represents an index of a one-dimensional vector X; and (3) carrying out time correlation analysis on the values of GAF of the normal electricity utilization data and the abnormal electricity utilization data according to the formulas (1.8) and (1.9).

Further, in the step 2, firstly, a mask matrix with the same size as the sample data is created, and the mask matrix is initialized to be a full 0 matrix; then determining the position of the missing value in the sample, and supplementing 1 at the same position in the mask matrix; by combining the mask matrices of the same size as the samples, a feature matrix with a channel number of 2 can be finally constructed.

Further, the missing values fill the following formula:

in the formula (1.10), i represents the position of the median value of the original data matrix, and x _i Representing the value at the position i of the original data matrix, f (x _i ) Representing the fill value after processing according to equation (1.10);

equation (1.11) represents the construction method of the mask matrix in the missing value filling method used; where i represents the position of the median value of the raw data matrix, x _i Representing the value at the position i of the original data matrix, g (x _i ) Represents the mask matrix transformed according to equation (1.11).

Further, the data is subjected to standardized processing by using quantile transformation, and then is input into a model for training; and meanwhile, the training set and the verification set are randomly split by adopting hierarchical sampling.

An apparatus for power theft classification based on a res net and DSCAttention mechanism, comprising:

the data analysis module is used for carrying out data analysis on the obtained data, obtaining time sequence data characteristics with stronger identifiability through trend season analysis and correlation analysis, and preparing the design and construction of the model according to the time sequence data characteristics;

the data processing module is used for preprocessing data, analyzing and processing missing values of the data, constructing a mask matrix, normalizing the data by using quantile transformation to reduce the sensitivity of the model to abnormal values, and dividing a training set and a verification set by using a method of splitting the data set in a layering way.

The invention has the characteristics and effects that:

(1) The method comprises the steps of using ResNet as a basic network, introducing a DSCAttention depth separable enhanced self-attention mechanism, combining the DSCAttention mechanism with a convolutional neural network part of the ResNet, extracting and learning the correlation between the intrinsic complex characteristics and time sequences of the power stealing time sequence data, and using the characteristics to guide the classification of a classifier.

(2) Because the electricity stealing identification is a two-class problem, when the model is trained, the cross entropy loss function is used for guiding the adjustment of the parameters of the neural network of each layer of the model, so that the model is optimally trained, and the optimal solution on the electricity stealing classification problem is obtained.

(3) The abnormal value is a missing value, and the abnormal value exists in the electricity stealing data, in the deep learning method, the model cannot directly calculate the null value, so that the missing value of the data is filled by using a zero padding method, in the model training process, the abnormal value can obtain great attention, so that the model is difficult to learn the actual distribution rule in the data, and the sensitivity of the abnormal value can be effectively reduced by adopting a quantile transformation method.

(4) In real-world data, electricity theft data is often unbalanced, and the problem can be alleviated to some extent by using a method of hierarchically splitting a training set and a verification set.

In summary, the invention performs better in electricity theft classification.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a diagram of a power theft classification model framework based on ResNet and DSCAttention mechanisms.

Fig. 2 is a schematic diagram of a depth separable convolution operation.

Fig. 3 is a schematic diagram of a channel-by-channel self-attention calculation.

Fig. 4 is a schematic diagram of the self-attention mechanism of convolution enhancement.

FIG. 5 is a schematic diagram of a depth separable convolution enhanced self-attention mechanism.

FIG. 6 is a flowchart of an overall implementation of a power theft classification algorithm.

Fig. 7 shows ROC graphs for different models.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

A method for classifying electricity stealing based on ResNet and DSCAttention mechanisms selects ResNet18 network as basic network structure, and introduces deep separable convolution enhanced self-attention mechanism layer based on the basic network structure, so that the model not only can extract internal complex features of electricity stealing time sequence, but also can consider correlation between time sequences, and obtains better classification accuracy in final classification task. Thus, a power theft classification model based on ResNet and DSCAttention mechanisms is obtained, and the whole framework is shown in figure 1.

1. ResNet residual error network structure

The invention modifies the traditional ResNet18 network, adds a concentration layer and depth separable convolution structure, and the network parameters of each layer of the final model structure are shown in table 1. The model modifies the first convolution layer and the maximum pooling layer of ResNet18, reduces the down-sampling multiple of the original feature input, and retains more useful information. At the second and third convolution layers of ResNet18, the model herein, in addition to modifying the number of output channels of the two-layer convolution layers, introduces a channel-by-channel convolution in a depth separable convolution structure and a channel-by-channel self-attention mechanism as used herein. The model then replaces the residual network structure left by the ResNet18 with a point-by-point convolutional layer and two fully-connected neural networks. The point-by-point convolution layer and the upper layer channel-by-channel convolution form a complete depth separable convolution structure, and the two layers of fully-connected neural networks serve as classifiers.

Table 1 comparison of network parameters

1.1Softmax function

The Softmax activation function is used for multi-classification problem, the inputs of a plurality of neurons are mapped between [0,1], the sum of mapping elements in the vector is 1, and the value calculation formula is as follows:

in the formula (1.1), i and j are the labels of a certain class in the multi-class problem, and x _i Representing the i-th class value of the input x-values, and mapping the value to an exponential function,representing the sum of all class value indices.

1.2Relu activation function

The formula for the Relu activation function is:

Relu(x)＝max(0,x)(1.2)

in equation (1.2), x is an input value, and the Relu function takes a maximum value between 0 and x, so that when the input is negative, the function is not activated, and is not applicable to the case where the input is negative.

1.3PRelu activation function

The PRelu activation function is formulated as:

PRelu(x)＝max(0,x)+a _x min(0,x)(1.3)

in the formula (1.3), x is an input value, a _x Is the slope of PRelu in the region where the input value is negative. When the input value x is a positive number, the form and nature of the PRelu activation function is the Relu activation function. When the input value x is negative, the PRelu activation function is calculated as the minimum value of 0 and the input value x, and the calculated minimum value is multiplied by the slope a of the PRelu activation function negative region _x 。

2. Convolution feature extraction layer and DSCAttention mechanism

2.1 convolutional neural network based power stealing data replication feature extraction

The convolutional neural network for time sequence trend item and seasonal feature extraction of the invention is a part consisting of three continuous layers conv3-16, conv3-16 and depthwise3-16 in table 1. The first two conv3-16 layers of the section are conventional convolution layers, the size of the convolution kernel is 3, the number of output channels is 16, the last depthwise3-16 layer is a channel-by-channel convolution, and the combination of pointwise1-48 point-by-point convolutions in table 1 forms a depth separable convolution operation calculation process as shown in fig. 2.

2.2 Power stealing timing dependency learning based on DSCAttention mechanism

The DSCAttention mechanism consists of two layers, attention-8 and depthwise3-8 in Table 1. Attention-8 is a channel-by-channel multi-head self-Attention mechanism, and the specific calculation process of this layer is shown in FIG. 3. depthwise3-8 is a channel-by-channel convolution enhancement portion of channel-by-channel multi-headed self-attention, and the specific calculation of this layer is shown in fig. 4. While the Attention-8 and depthwise3-8 are collectively referred to as the DSCAttention depth separable convolution enhanced self-Attention mechanism, the overall computation of this part is shown in FIG. 5.

(1) Channel-by-channel self-attention mechanism

The attention mechanism is a method of considering global information. Convolutional neural networks generally have the disadvantage of having a strong ability to extract local information, while there is little consideration and concern for global information. In the electricity stealing time sequence data, the electricity utilization behavior of a user has long-time correlation, and local convolution cannot be used for learning the long-time correlation.

The Attention mechanism mainly comprises three types of Soft Attention, hard Attention and self-Attention, the Attention mechanism method used by the invention is a self-Attention mechanism, and the calculation method is a formula (1.4).

In the formula (1.4), Q, K and V are the input matrix X and the weight parameters are W respectively ^Q ，W ^K ，W ^V The self-attention distribution of the input matrix X can be calculated by bringing Q, K and V into the formula (1.4). The self-attention distribution calculation method used herein is a new multi-head self-attention mechanism method combined with the channel attention idea. The calculation process is shown in fig. 3.

(2) Convolution enhanced self-attention mechanism

The invention adopts the convolution enhancement idea of a Conformer model on the whole idea, modifies the structure micro thereof, changes the self-attention part thereof into a channel-by-channel self-attention calculation method used herein, and optimizes the convolution enhancement part for 2D convolution operation. The convolution enhanced self-attention mechanism is shown in fig. 4.

(3) Depth separable convolution enhanced self-attention mechanism

Since the self-attention mechanism used in the invention is a channel-by-channel self-attention method, the conventional convolutional neural network of the convolutional enhancement part is replaced by a channel-by-channel convolutional neural network, so that a self-attention method DSCAttention based on depth separable convolutional enhancement is formed, and the calculation process of the method is shown in figure 5.

In fig. 5, the input matrix X is subjected to a channel-by-channel self-attention calculation and then outputs a self-attention distribution matrix, which is the correlation between the time sequences learned from the electricity larceny data by the model. A depth separable convolutional neural network is then used on the self-attention distribution matrix to compensate for the lack of local attention of the self-attention mechanism itself to the intra-and neighborhood periodicities.

3. Electricity larceny classification algorithm implementation flow

(1) Firstly, data analysis is carried out on the obtained data, time sequence data characteristics with stronger identifiability are obtained through trend season analysis and correlation analysis, and the design and construction of a deep learning algorithm model are prepared accordingly.

The intrinsic features of non-stationary time series data generally appear in a more complex form, so that it is difficult to perform feature analysis on the original sequence data. The time series is decomposed by using an additive model, and the intrinsic characteristics of the time series are decomposed into trend term and season term characteristics, so that the intrinsic characteristics of the time series can be easily analyzed. The time series additive model is shown in formula (1.5).

T _t ＝Trend _t +Seasonal _t +Resid _t (1.5)

In the formula (1.5), T _t Is the original time sequence, trend _t Is a trend term of a time series additive model, and is a Seaseal _t As season term, resin _t Is the residual term of the additive model of the time series. Trend term Trend of time sequence additive model _t The calculation of (2) is shown in formula (1.6).

In the formula (1.6), y _t+i The value at time t+i in the time series T is represented, and m=2k+1 represents an m-order moving average. Here, a 2D convolutional neural network is used on time-series complex feature extraction, so a partitioning method using 7 days as a period is used to convert one-dimensional time series data into two-dimensional space time series data, so the order or period of the moving average is 7, i.e., k=3. The sum of the seasonal term and the residual term is the time series minus the trend term according to the definition of the additive model by equation (1.5). Defining the set of t+i as { t+1, t+2, …, t+m }, the seasonal term calculation formula of the time series additive model is shown as (1.7).

The gladhand field (Gramian Angular Field, GAF) is a method by which time series data can be converted into spatial data. The Graham angle field converts time sequence data of a Cartesian coordinate system into a polar coordinate system, and the time sequence data are used for calculating correlation of analysis time sequences among different times by calculating included angles among data features in each time in the polar coordinate system. The GAF calculation is shown in formula (1.8).

In the formula (1.8), G is a square matrix of size n×n, cos (φ) _n +φ _n ) The equation of the form is a special inner product defined by GAF. Let the one-dimensional vector needed to calculate GAF be x= { X ₁ ,x ₂ ,...,x _n Vector X is just n in length. Phi in formula (1.8) ₁ Is arccos (x) ₁ ) Then cos (phi) _i +φ _j ) The specific calculation process of (2) is shown in the formula (1.9).

cos(φ _i +φ _j )＝cos(arccos(x _i )+arccos(x _j )) (1.9)

In the formula (1.9), i, j represents the index of the one-dimensional vector X. And (3) carrying out time correlation analysis on the values of GAF of the normal electricity utilization data and the abnormal electricity utilization data according to the formulas (1.8) and (1.9).

(2) After the characteristic analysis of the data is carried out, a data preprocessing link is entered. The data preprocessing link mainly analyzes and processes missing values existing in data and constructs a mask matrix. The sensitivity of the model to outliers is reduced by normalizing the data using quantile transformation, and the training set and the validation set are partitioned by using a method of hierarchically splitting the data set.

The invention solves the problem of sample data missing by adopting a mode of not processing missing values. First, a mask matrix of the same size as the sample data is created, and the mask matrix is initialized to an all 0 matrix. The location of the missing value in the sample is then determined, with 1 being appended at the same location in the mask matrix. By combining the mask matrices of the same size as the samples, a feature matrix with a channel number of 2 can be finally constructed.

The missing value filling method is given by formula (1.10), wherein i represents the position of the value in the original data matrix, x _i Representing the value at the position i of the original data matrix, f (x _i ) The expression is according to formula (1.10) And (5) filling values after treatment.

Equation (1.11) gives the way the mask matrix is constructed in the missing value filling method used herein. Where i represents the position of the median value of the raw data matrix, x _i Representing the value at the position i of the original data matrix, g (x _i ) Represents the mask matrix transformed according to equation (1.11).

In general, the data features are greatly different, and when the model is trained by using the original data, the features with larger values can take larger effect in the model learning process, so that the quality of the model is affected. Therefore, the raw data needs to be standardized and then input into the model for training. A common normalization method is maximum and minimum normalization (Maximum and Minimum Normalization), the formula of which is shown as (1.12).

Wherein min (X) is the minimum value of X, max (X) is the maximum value of X, X _i Is the value in X. Due to reasons of equipment false alarm and the like, the recorded data also often has the problem of abnormal value, and the maximum and minimum value normalization method is sensitive to the abnormal value, so that the problem of abnormal value sensitivity of the normalization method is solved, and a data normalization processing method of quantile transformation is used.

Because of the unbalance of the data set, the method of randomly splitting the training set and the verification set can lead to few or no negative samples of the training set, the model can not learn knowledge of the negative samples, and can also lead to few or no negative samples of the verification set, and the evaluation index of the model loses the reference meaning of the negative samples. To solve the above problem, a method of randomly splitting the training set and the validation set by hierarchical sampling (Stratified Sampling) is adopted herein, as shown in formula (1.13).

D _new-normal :D _new-outlier ＝D _old-normal :D _old-outlier (1.13)

(3) When the data processing is finished, the deep learning model is built and trained, and different super parameters are required to be adjusted in the model training process so as to find the super parameter combination with the best effect on classification evaluation. Finally, the best model for evaluating the AUC is compared with other models to obtain the advantages of the model used in the text.

In summary, the self-attention layer of the Conformer model of the present invention employs a conventional multi-headed self-attention mechanism, while the DSCAttention section uses a channel-by-channel multi-headed self-attention mechanism to calculate the self-attention distribution on each feature channel. In the convolution enhancement section, the Conformer model uses a 1D depth-separable convolution structure for enhancing the neighborhood period correlation of the learning time series, while the DSCAttention section uses a 2D depth-separable convolution structure for enhancing the correlation within the learning time series period and during the neighborhood period. Meanwhile, on the ResNet residual structure, a Conformer-based improved DSCAttention mechanism is introduced, so that the capability of the convolutional neural network lacking in time correlation attention can be made up. In the aspect of electricity stealing time sequence data, the electricity utilization period is generally recorded in one month or one year, the electricity utilization period is long, and the local principle of the convolutional neural network limits the capability of the convolutional neural network to process long-time sequences. Therefore, resNet fusion uses a multi-headed self-attention mechanism that is excellent in the ability to process long time sequences, an effective way to complement the short plate of the residual convolutional neural network in processing long time sequences.

The present invention compares the evaluation indexes of Random Forest (RF), support vector machine (Support Vector Machine, SVM) and Wide & Deep Convolutional Neural Networks (WDCNN) on AUC, map@100 and map@200, respectively, as shown in table 2.

TABLE 2 classification results

As can be seen from Table 2, the classification effect based on ResNet and DSCAttention mechanisms provided by the invention is best, 91.92% of AUC, 98.58% of MAP100 and 96.77% of MAP@200 are achieved, and on the best AUC index of each model, the classification effect is improved by 11.84% compared with a random forest RF, 10.34% compared with a support vector machine SVM, and 11.46% compared with a WDCNN model.

On the MAP@100 index, the method is improved by 14.33% compared with the random forest RF, 9.23% compared with a support vector machine SVM and 31.01% compared with a WDCNN model.

On the MAP@200 index, the method is 20.62% higher than the random forest RF, 13.04% higher than the support vector machine SVM, and 29.76% higher than the WDCNN model.

Fig. 7 is a ROC curve of the present invention versus other comparative model experiments. By observing the ROC curves of the four models, the ROC curves of the invention can be seen to be higher than the ROC curves of three models of WDCNN, SVM and RF, so the invention has better performance in the electricity larceny classification

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It should be understood by those skilled in the art that the above embodiments do not limit the scope of the present invention in any way, and all technical solutions obtained by equivalent substitution and the like fall within the scope of the present invention. The invention is not related in part to the same as or can be practiced with the prior art.

Claims

1. A method for classifying electricity theft based on ResNet and DSCAttention mechanisms is characterized by comprising the following steps:

2. The method for electricity larceny classification based on ResNet and DSCAttention mechanisms as claimed in claim 1, wherein the electricity larceny classification model uses ResNet18 network as a basic network structure, and introduces a deep separable convolution enhanced self-attention mechanism layer on the basis, so that the model can extract intrinsic complex characteristics of electricity larceny time sequence and can consider correlation between time sequences.

3. The method of steal classification based on the res net and dscattntion mechanisms of claim 2, wherein the number of output channels of the second and third convolutional layers of the res net18 network is modified and a channel-by-channel convolution and a channel-by-channel self-attention mechanism in a deep separable convolutional structure is introduced; and then replacing the residual network structure left by the ResNet18 with a point-by-point convolution layer and a two-layer fully-connected neural network, wherein the point-by-point convolution layer and the upper-layer channel-by-channel convolution form a complete depth separable convolution structure, and the two-layer fully-connected neural network is used as a classifier.

4. The method for classifying electricity theft based on ResNet and DSCAttention mechanisms according to claim 1, wherein in step 1, the time series is decomposed by using an additive model, and the intrinsic features of the time series are decomposed into trend term and season term features, so that the intrinsic features of the time series can be easily analyzed; the timing additive model is shown in formula (1.5):

T _t ＝Trend _t +Seasonal _t +Resid _t (1.5)

5. The method of claim 4, wherein the Trend term Trend of the time series additive model is a Trend term of the ResNet and DSCAttention mechanism _t The calculation of (2) is shown in the formula (1.6):

6. The method for classifying electricity theft based on ResNet and DSCAttention mechanisms according to claim 1, wherein in step 1, time series data of a Cartesian coordinate system is converted into a polar coordinate system by a Graham angle field, and correlation of analysis time series between different times is calculated by calculating included angles between data features at each time in the polar coordinate system; the calculation of GAF is shown in formula (1.8):

in the formula (1.8), G is a square matrix of size n×n, cos (φ) _n +φ _n ) In the form ofThe equation is the special inner product defined by GAF, let the one-dimensional vector needed to calculate GAF be X= { X ₁ ,x ₂ ,...,x _n The length of vector X is just n, phi in the formula ₁ Is arccos (x) ₁ )。

7. The method of power theft classification based on ResNet and DSCAttention mechanisms of claim 6, wherein cos (φ _i +φ _j ) The specific calculation process of (2) is shown in the formula (1.9):

cos(φ _i +φ _j )＝cos(arccos(x _i )+arccos(x _j )) (1.9)

8. The method of claim 1, wherein in step 2, a mask matrix having the same size as the sample data is created, and the mask matrix is initialized to be all 0 matrices; then determining the position of the missing value in the sample, and supplementing 1 at the same position in the mask matrix; by combining the mask matrices of the same size as the samples, a feature matrix with a channel number of 2 can be finally constructed.

9. The method of claim 8, wherein the missing value fills the following formula:

10. An apparatus for power theft classification based on a res net and DSCAttention mechanism, comprising: