CN117056874A

CN117056874A - Unsupervised electricity larceny detection method based on deep twin autoregressive network

Info

Publication number: CN117056874A
Application number: CN202311040028.3A
Authority: CN
Inventors: 李琪林; 彭德中; 周尧; 彭军
Original assignee: Marketing Service Center Of State Grid Sichuan Electric Power Co
Current assignee: Marketing Service Center Of State Grid Sichuan Electric Power Co
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-11-14

Abstract

The application discloses an unsupervised electricity stealing detection method based on a deep twin autoregressive network, which specifically comprises the following steps: preprocessing the original data; acquiring a subsequence through a sliding window; reconstructing the time sequence through a depth twin autoregressive network; calculating a reconstruction error of the input sequence at each moment; calculating the sample outlier; and judging abnormal users in the electricity consumption data through threshold comparison. Reconstructing unmarked input data through two twin autoregressive subnetworks respectively and independently, and optimizing model parameters through iterative training by using normal samples in reconstructed error prediction data, thereby avoiding introducing extra noise; the multi-head self-attention mechanism is utilized to capture complex characteristics such as time dependence, periodicity, randomness and the like in the electricity utilization data, and the problem of insufficient extraction of relevant characteristics in the data by the existing detection method is solved by reconstructing effective representation of normal sample learning data, so that the accuracy rate of electricity stealing behavior detection is improved.

Description

Unsupervised electricity larceny detection method based on deep twin autoregressive network

Technical Field

The application belongs to the technical field of electric power data analysis, and particularly relates to an unsupervised electricity larceny detection method based on a deep twin autoregressive network.

Background

Electricity theft detection, also known as abnormal electricity usage detection, is a sub-field of time series anomaly detection aimed at identifying electricity usage that does not conform to the normal electricity usage laws or violates electricity usage contracts. In the electric power system, the electricity stealing behavior not only can cause a large amount of electric power loss and economic loss, but also increases the risk of electricity utilization safety accidents, so that the electricity stealing detection has important significance in the aspects of timely finding and correcting unreasonable electricity utilization behavior of users, reducing energy loss, improving electricity utilization safety and the like.

The traditional abnormal electricity behavior detection method is mainly realized by manpower, the bypassed transmission line is checked by manually checking and comparing abnormal electricity meter readings, the method is time-consuming and labor-consuming, has high labor cost and lower efficiency, and an inspector needs to have professional knowledge in the related field to make correct judgment. Along with the laying of a large number of intelligent hardware devices in the intelligent power grid, a large number of high-dimensional power use data are generated in the power system, so that analysis and detection of the collected power data become possible, a large number of detection algorithms based on machine learning and deep learning are widely applied, a large number of manual marking data are usually required by the methods, abnormal samples are usually very rare, manual marking is often difficult, and a detection method based on supervised learning is difficult to popularize in practical application in a large scale.

In recent years, a reconstruction-based unsupervised anomaly detection method has good detection effect in a plurality of application fields, and on one hand, the detection method introduces extra noise due to model training by acquiring a normal sample through a heuristic rule, so that detection accuracy is affected; on the other hand, due to the fact that the cyclic neural network or the variant thereof is adopted to model time information, long-term time dependency, periodicity and other complex time sequence characteristics are difficult to capture.

Disclosure of Invention

The application aims to provide an unsupervised electricity larceny detection method based on a deep twin autoregressive network, which solves the problem that the existing detection method is insufficient in extracting relevant features in data, and simultaneously effectively improves the accuracy of electricity larceny behavior detection.

In order to solve the technical problems, the application is realized by the following steps:

an unsupervised electricity larceny detection method based on a depth twin autoregressive network specifically comprises the following steps:

1) Preprocessing the original data;

2) Acquiring a subsequence through a sliding window;

3) Reconstructing the time sequence through a depth twin autoregressive network;

4) Calculating a reconstruction error of the input sequence at each moment;

5) Calculating the sample outlier;

6) And judging abnormal users in the electricity consumption data through threshold comparison.

Further, the specific method for preprocessing the raw data in the step 1) is as follows:

the time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) ₁ ,s ₂ ,…,s _n )∈R ⁿ Wherein n is the length of S; in order to avoid introducing data deviation due to filling of missing values, the application marks the original data sample by adopting a group of binary masks on the basis of zero filling processing of the original data sample, namely the missing value data is marked as 0, and the non-missing value data is marked as 1, and the expression is as follows:

wherein Mask (·) represents a binary Mask processing function, naN represents a missing value, s _t Representing the observed value of the time sequence S at the time t;

in order to avoid scale difference of different samples and different features, each feature is normalized, the normalization operation aiming at time steps is adopted, namely, each time step of each sample is processed, the average value of each sample is adjusted to 0, and the variance of each sample is kept unchanged, so that the time sequence characteristics of data are kept to the maximum extent.

Further, the specific method of the step 2) is as follows:

setting the step length of a sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, and converting the unitary time sequence into a multi-element time sequence, wherein the non-overlapping subsequences are X= { X ₁ ,x ₂ ,…,x _N }, whereinSign->Representing the feature x of the rounding operation at time t (t.ltoreq.N) _t ＝The t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector _t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:

x _t ＝{s _t×M ,s _t×M+1 ,...,s _t×M+M-1 }(2)

further, the specific steps of the step 3) are as follows:

31 Position encoding the input sequence;

32 -embedding the input sequence by a linear mapping;

33 Feature extraction by two independent encoders;

34 Decoding by two independent decoders;

35 The original input sequence is reconstructed by the reconstruction layer from the decoder output.

Specifically, the specific method of the step 31) is as follows:

the position information of the input sequence is characterized by position coding, the expression of the position coding vector PE is as follows:

wherein pos represents the position in the input sequence, d represents the dimension of the hidden layer, PE _(pos,2i) And PE (polyethylene) _(pos,2i+1) Representing the values of the even and odd bits, respectively, in the encoded vector of the input sequence at pos positions.

The specific method of the step 32) is as follows:

output h of each time step _t All depending on the state h of the previous T time steps _t-T:t-1 The expression formula is as follows:

h _t ＝f(h _t-T ,h _t-T+1 ,…,h _t-1 )+ε, (4)

wherein h is _t Represents the current observed value, h _t-T:t-1 ＝h _t-T ,h _t-T+1 ,…,h _t-1 Representing the first T observations, T representing the hysteresis order of the model, ε representing random noise or residual errors in the model that cannot be explained;

f represents a twin autoregressive neural network comprising two sub-networks based on multi-head autoregressive, capturing the relevance and similarity of different features by linear projection through an embedding layer sharing parameters by transforming the original time sequence X E R ^M×N Mapping to a low-dimensional vector space h E R ^d×N The method comprises the steps of carrying out a first treatment on the surface of the When the input sequence length M is larger, a smaller hidden layer dimension d is set to reduce the calculation amount, and the expression is as follows:

h＝W _h X+b _h (5)

wherein W is _h And b _h Representing two sub-network sharing parameters;

the position coding information of the data is integrated into the characteristics, and the expression is as follows:

h _PE ＝h _(pos,i) +PE _(pos,i) (6)

wherein h is _PE Indicating the features incorporating position coding, h _(pos,i) Representing the ith bit of the embedded layer encoding vector at pos position;

the specific method of the step 33) is as follows:

constructing a twin autoregressive network by a multi-head attention-based encoder and decoder stack, the encoder first utilizing an autoregressive mechanism for each time step of the inputThe conversion is performed with the following expression:

wherein,represents a scaling factor, q=w _q h _PE +b _q ，K＝W _k h _PE +b _k ，V＝W _v h _PE +b _v Respectively representing a query vector, a key vector and a value vector;

and fusing a plurality of self-attention layers by using a multi-head attention mechanism, so that the model focuses on different characterization subspace information from different positions together, wherein the expression is as follows:

wherein Concat (·) represents vector concatenation operation, W _o Representation modelA profile parameter;

and then, introducing nonlinear information into the model by using two forward layers and ReLU activation, wherein the expression is as follows:

FFN(h′)＝W ₂ ReLU(W ₁ h′+b ₁ )+b ₂ (9)

wherein h' represents the output of the previous layer, W ₁ 、W ₂ 、b ₁ 、b ₂ Representing the parameters of the forward layer respectively, and then performing residual connection and normalization processing.

The specific method of the step 34) is as follows:

the decoder is responsible for decoding the output of the encoder of step 33) to reconstruct the original input sequence into a new target sequence;

firstly, converting input data by using a shielding self-attention network, for the t-th time step, only the first t-1 positions can be used as input, the t-th positions and the positions after the t-th positions are shielded, so that when the t-th positions are generated, only the information of the first t-1 positions is used, and an attention weight matrix A is given, and the expression of shielding operation is as follows:

wherein A is _ij The attention weights of the positions i and j are represented, and after the decoder performs the transformation of the input data by masking the self-attention, the decoder performs the decoding operation by the same network structure as the encoder.

The specific method of the step 35) is as follows:

the encoder and decoder of two sub-networks are obtained by alternately stacking a multi-head attention layer and a nonlinear layer, the sub-networks reconstruct the input time series data through an autoregressive mechanism, each sub-network predicts the value of the next time step according to the generated partial sequence and takes the value as input to continue to generate the prediction of the next time step until the complete sequence reconstruction is completed; the application adopts a linear layer to output the characteristic h of the decoder ^out ∈R ^d×N Is reconstructed into

Wherein W is _rec. ∈R ^M×d And b _rec. ∈R ^1×M Representing the reconstruction layer parameters.

Further, the specific steps of the step 4) are as follows:

two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, measuring the error between the predicted value and the true value through a mean square error loss function, and calculating a reconstruction error, wherein the mean square error loss function has the expression as follows:

where N represents the length of the time series, M represents the dimension of each time step feature,i-th feature representing the t-th time step of the input sequence X,/th feature representing the t-th time step of the input sequence X>And->Respectively representing the ith reconstructed characteristic of the two sub-networks at the t-th time step;

reconstructing the total error into a data setThe expression of the average of the sum of the reconstruction errors of all samples in the two sub-networks is as follows:

the larger the reconstruction error of one sample is, the higher the outlier degree is, the sample with the high outlier degree is regarded as a normal sample, and meanwhile, the intersection of the two sub-network output samples is input into the shared embedding layer in the step 3) again, and then iterative training is carried out. The training mode based on the autoregressive and the mean square error loss function enables the model to learn the modes and the characteristics in the time sequence data step by step, so that the reconstruction capability and the prediction accuracy of the data are improved.

Further, the specific steps of the step 5) are as follows:

after model training is completed, reconstructing a new sample through a twin autoregressive network to obtain a reconstruction error of each time stepAnd->The abnormal scoring condition of the new sample is obtained through pooling operation, and the expression is as follows:

wherein MaxPool (·) represents maximum pooling, avePool (·) represents average pooling, for feature e of time step t ^t Carrying out average pooling so as to fuse the characteristics of the moment, and carrying out maximized pooling on all moments so as to extract the most specific representation;

the outliers of sample X were obtained from the anomaly scores of the two autoregressive subnetworks and expressed as follows:

OD(X)＝Score ₁ (X)+Score ₂ (X)(15)

further, the specific steps of the step 6) are as follows:

if the outlier OD (X) > τ of the sample X is determined as an outlier, the outlier of the whole samples in the data is compared with the threshold τ one by one to detect the outlier in the system.

Compared with the prior art, the application has the beneficial effects that:

firstly, preprocessing an input time sequence, including filling a missing value, normalizing and the like, and then extracting a series of non-overlapping subsequences from the original time sequence through a sliding window with the length of M, so that a unitary time sequence is converted into a multi-element time sequence; after preprocessing, time sequence feature extraction is carried out through a twin autoregressive neural network, a normal mode in data is captured, and an abnormal score of each sample is calculated. The twin autoregressive network comprises two multi-headed autoregressive-based subnetworks that share an embedded layer, have a similar encoder-decoder structure, and are parameter optimized by reconstruction loss. In the training stage, the abnormal scores of the samples are respectively output, and are ranked according to the scores, so that a plurality of samples with the highest abnormal degree are regarded as abnormal points. And in the testing stage, the outlier scores output by the two sub-networks are summed to obtain the outlier degree of the test sample, and finally, the outlier sample is judged through a threshold value.

The application reconstructs the unmarked input data through the two twin autoregressive subnetworks respectively and independently, and then combines the reconstruction errors of the two subnetworks to predict the normal sample in the data without manually marking the data, thereby optimizing the model parameters through iterative training and avoiding introducing extra noise; the multi-head self-attention mechanism is utilized to capture complex characteristics such as time dependence, periodicity, randomness and the like in the electricity utilization data, the effective representation of the data is learned by reconstructing the behavior pattern of the normal sample, the problem that the existing detection method is insufficient in extracting relevant characteristics in the data is solved, and the accuracy rate of detecting electricity stealing behaviors is effectively improved.

Drawings

FIG. 1 is a schematic flow chart of the present application.

Detailed Description

The following describes the embodiments of the present application in further detail with reference to the drawings and specific examples.

As shown in fig. 1, an unsupervised electricity theft detection method based on a deep twin autoregressive network comprises the following steps:

1) Preprocessing the original data

The time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) ₁ ,s ₂ ,…,s _n )∈R ⁿ Wherein n is the length of S; because of errors and negligence in the data measurement and recording processes, missing values may exist in real time series data, in order to avoid introducing data deviation due to filling of the missing values, the application marks the original data sample by adopting a group of binary masks on the basis of zero filling processing of the original data sample, namely, the missing value data is marked as 0, and the non-missing value data is marked as 1, and the expression is as follows:

wherein Mask (·) represents a binary Mask processing function, naN (not a number) represents a missing value, s _t Representing the observed value of the time sequence S at the time t;

in order to avoid scale differences of different samples and different features, normalization processing is carried out on each feature, such as Z-score normalization, min-max normalization and the like, and the operations assume that the different features are mutually independent, so that time dependence among the features is destroyed; the application adopts normalization operation aiming at time steps, namely, each time step of each sample is processed, the average value of each sample is adjusted to 0, and the variance of each sample is kept unchanged, so that the time sequence characteristic of the data is kept to the maximum extent.

2) Sub-sequence acquisition through sliding window

In order to better represent the local mode and trend (such as ascending, descending, stable and the like) in the time sequence, setting the step length of the sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, converting the unitary time sequence into a multi-element time sequence, and searching the time mode and trend under different time scales by adjusting the size and the step length of the sliding window, so that the characteristics of the time sequence data can be more comprehensively analyzed.

The non-overlapping subsequence is x= { X ₁ ,x ₂ ,…,x _N }, whereinSign->Features of time t (t.ltoreq.N) representing the rounding down operation>The t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector _t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:

x _t ＝{s _t×M ,s _t×M+1 ,...,s _t×M+M-1 } (2)

3) Reconstruction of time series through depth twin autoregressive network

31 Position coding of the input sequence

The user power consumption data is a typical time sequence, complex characteristics such as time sequence, periodicity, randomness and the like can be included among each time step, and the original input data can carry the position information by introducing the position coding information, so that a model learns the position characteristics in the data. The position information of the input sequence is characterized by position coding, the expression of the position coding vector PE is as follows:

32 Embedding representation of input sequence through a linear mapping

The autoregressive model is a sequence generation model and describes the dependency relationship between the current observed value and the previous observed value in time sequence data, and the output h of each time step _t All depending on the state h of the previous T time steps _t-T:t-1 The expression formula is as follows:

h _t ＝f(h _t-T ,h _t-T+1 ,…,h _t-1 )+ε, (4)

h＝W _h X+b _h (5)

wherein W is _h And b _h Representing two sub-network sharing parameters;

h _PE ＝h _(pos,i) +PE _(pos,i) (6)

wherein h is _PE Indicating the addition of position codesCharacteristics of code, h _(pos,i) Representing the ith bit of the embedded layer encoding vector at pos position;

33 Feature extraction by two independent encoders

wherein,represents a scaling factor, q=w _q h _PE +b _q ，K＝W _k h _PE +b _k ，V＝W _v h _PE +b _v Respectively representing a query vector, a key vector and a value vector; the similarity between Q and K is used to calculate the weight and then the weighted sum with V to obtain the output vector, the essence of the attention mechanism is to weight sum all time steps according to the calculated weight, it allows the time step information from any distance to flow directly to the current step, making the attention mechanism have the ability to capture long-term time dependence.

wherein Concat (·) represents vector concatenation operation, W _o Representing model parameters;

FFN(h′)＝W ₂ ReLU(W ₁ h′+b ₁ )+b ₂ (9)

34 Decoding by two independent decoders

The decoder is responsible for decoding the output of the encoder of step 33) to reconstruct the original input sequence into a new target sequence; in the process of generating the sequence, the decoder depends on the previous time step and the state of the decoder itself, and all positions after the current time step need to be masked in order to avoid the prediction of the current time step from being influenced by the future time step.

35 Reconstructing the original input sequence by the output of the decoder through the reconstruction layer

The encoder and decoder of the two sub-networks are obtained by alternately stacking a multi-headed attention layer and a non-linear layer, the sub-networks reconstruct the input time-series data by means of an autoregressive mechanism, each sub-network predicts the value of the next time step from the partial sequence that has been generated and continues to generate as input the prediction of the next time step,until complete sequence reconstruction is completed; the application adopts a linear layer to output the characteristic h of the decoder ^out ∈R ^d×N Is reconstructed into

4) Calculating the reconstruction error of the input sequence at each instant

Two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, measuring the error between the predicted value and the true value by a mean square error loss function (Mean Squared Error, MSE) to calculate a reconstruction error, wherein the representation of the mean square error loss function is as follows:

where N represents the length of the time series, M represents the dimension of each time step feature,i-th feature representing the t-th time step of the input sequence X,/th feature representing the t-th time step of the input sequence X>And->Representing two sub-networks respectivelyThe ith reconstructed feature at the t-th time step;

the reconstruction error of each sample corresponds to a score, the score reflects the difficulty level of the sample in reconstruction through an autoregressive model, and the normal mode in the time sequence can be learned through a training network because the normal sample is in most of the data set. The larger the reconstruction error of one sample is, the higher the outlier degree is, the sample with the high outlier degree is regarded as a normal sample, and meanwhile, the intersection of the two sub-network output samples is input into the shared embedding layer in the step 3) again, and then iterative training is carried out. The training mode based on the autoregressive and the mean square error loss function enables the model to learn the modes and the characteristics in the time sequence data step by step, so that the reconstruction capability and the prediction accuracy of the data are improved. Meanwhile, as the selection of the loss function is based on the mean square error, the model is more concerned about the overall reconstruction of the whole sequence in the training process, and not just the matching of local modes, so that the generalization capability and stability of the model are enhanced.

5) Calculating sample outliers

The sample outlier is used for measuring the possibility that one sample belongs to an outlier, and after model training is completed, a new sample is reconstructed through a twin autoregressive network to obtain a reconstruction error of each time stepAnd-> The abnormal scoring condition of the new sample is obtained through pooling operation, and the expression is as follows:

OD(X)＝Score ₁ (X)+Score ₂ (X)(15)

6) Determining abnormal users in electricity data by threshold comparison

Specific examples:

an electricity consumption abnormality detection data set issued by a national electric network (SGCC) of China comprises electricity consumption of 42372 users within 1035 days from 1 month in 2014 to 10 months in 2016, wherein 3615 users have abnormal electricity consumption behaviors. There is a large amount of missing data in this dataset, and the statistics are shown in table 1.

Table 1SGCC dataset statistics

1) Raw data preprocessing

Firstly, extracting a mask of a missing value in an input sequence by using a formula (1), and then processing each time step of each sample to adjust the average value to 0. Taking the first sample in the data set of table 1 as an example, the raw data is as follows:

the generated missing value mask is as follows:

the average value of the sample is about 7.93615 after filling the missing value with 0, and then the sample is subjected to data normalization to obtain the following table:

2) Sub-sequence acquisition through sliding window

The input unitary time series is converted into a unitary time series through a sliding window using equation (2). Setting the sliding window length and the step size m=7, the samples are converted into a multivariate time series in units of weeks as follows:

3) Reconstruction of time series through depth twin autoregressive network

31 Position coding of the input sequence

Calculating the position coding vector of the time sequence by the formula (3), if the dimension d=16 of the hidden layer, the position coding of each time step is a 16-dimensional vector, taking time steps 0,1 and 2 as an example, and the position coding is as follows:

32 Embedding representation of input sequence through a linear mapping

The relevance and similarity between different features are captured by linear projection through an embedding layer sharing parameters, and then the position coding information of the data is merged into the features (formula 6). Embedded layer parameter W _h And b _h Can be obtained by iterative training calculation.

33 Feature extraction by two independent encoders

And encoding the data passing through the embedded layer by using formulas 7-9, and capturing the time dependence, periodicity, randomness and other characteristics in the time sequence data through a multi-head self-attention mechanism.

34 Decoding by two independent decoders;

the decoder is responsible for decoding the output of the encoder so as to reconstruct the original input sequence into a new target sequence, and each time step depends on the previous time step and the state of the decoder itself in the process of generating the sequence; calculate the weight score of the masked self-attention using equation 10 and input data for each time stepThe transformation is performed and the decoding operation is performed through the same network structure as the encoder.

The encoder and decoder of two sub-networks are obtained by alternately stacking a multi-headed note layer and a non-linear layer, reconstructing the input time-series data using equation 11, and re-expressing the input sequence x as two sub-networks, respectivelyAnd->

4) Calculating the reconstruction error of the input sequence at each instant

Two outputs of a twin autoregressive subnetworkAnd->And comparing the predicted value with the input sequence x respectively, masking the missing value mask, and finally calculating a reconstruction error between the predicted value and the true value through a formula 12 and a formula 13. The intersection of the samples output by the two sub-networks is input to the shared embedding layer again, and then iterative training is carried out to optimize each parameter of the twin autoregressive network model.

5) Calculating sample outliers

After model training is completed, reconstructing a new sample through a twin autoregressive network to obtain a reconstruction error of each time stepAnd->The anomaly scores for all samples were calculated by equations 14 and 15, taking the first 7 samples in the dataset of Table 1 as an example, and the outliers were calculated as follows:

6) Determining abnormal users in electricity data by threshold comparison

Comparing the outliers of the whole samples, outputting samples with outliers higher than the threshold τ as outliers, and determining sample 6 as an outlier if outlier threshold τ=1000 among the 7 samples calculated in step 5.

The present example evaluates the performance of the proposed algorithm by two evaluation metrics: AUC (Area Under the receiver operating characteristic Curve) and AP (Average Precision). AUC emphasizes class separability, i.e., whether the algorithm is able to distinguish well between different classes; while the AP emphasizes the integrity of the detection, i.e., whether the algorithm is able to retrieve more relevant samples. The larger the AUC value of a detection algorithm is, the better the detection performance is; higher AP scores represent higher detection accuracy and more positive examples can be retrieved.

Experiments herein were all performed on a server of NVIDIA GeForce RTX 3090GPU with a learning rate set to 0.0001, using Adam optimizer to update model parameters. The hidden layer dimension parameter of the proposed model is set to d=16, and the subsequence length is set to 7; the layers of the encoder and decoder of each of the two twin autoregressive subnetworks are the same, one subnetwork comprises a 4-layer autoregressive network and 8 groups of multi-head self-attention modules, and the number of layers and the number of multi-head self-attention modules of the other subnetwork are half of that of the other subnetwork.

According to the application, seven unsupervised anomaly detection methods are selected as comparison to evaluate the proposed method, the test scores of the comparison methods are shown in table 2, and experimental results show that the method provided by the application has remarkable advantages compared with a classical algorithm, and can more accurately identify and detect the abnormal electricity consumption behavior.

Table 2AUC score Table 2AUC scores of competing methods for experimental methods

The foregoing is merely illustrative of the embodiments of this application and it will be appreciated by those skilled in the art that variations may be made without departing from the principles of the application, and such modifications are intended to be within the scope of the application as defined in the claims.

Claims

1. An unsupervised electricity larceny detection method based on a deep twin autoregressive network is characterized in that: the method specifically comprises the following steps:

1) Preprocessing the original data;

2) Acquiring a subsequence through a sliding window;

4) Calculating a reconstruction error of the input sequence at each moment;

5) Calculating the sample outlier;

2. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific method for preprocessing the raw data in the step 1) is as follows:

the time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) ₁ ,s ₂ ,…,s _n )∈R ⁿ Wherein n is the length of S; zero padding is carried out on an original data sample, a group of binary masks are adopted to mark the original data sample, the missing value data is marked as 0, the non-missing value data is marked as 1, and the expression is as follows:

and (3) performing normalization operation on time steps, processing each time step of each sample, adjusting the mean value of each time step to 0, and keeping the variance unchanged, so that the time sequence characteristic of the data is kept to the maximum.

3. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific method of the step 2) is as follows:

setting the step length of a sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, and converting the unitary time sequence into a multi-element time sequence, wherein the non-overlapping subsequences are X= { X ₁ ,x ₂ ,…,x _N }, whereinSign->Representing the rounding down operation, characteristic at time t +.> The t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector _t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:

x _t ＝{s _t×M ,s _t×M+1 ,...,s _t×M+M-1 } (2)

4. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific steps of the step 3) are as follows:

31 Position coding of the input sequence

The position information of the input sequence is characterized by position coding, and the expression of the position coding vector PE is as follows:

wherein pos represents the position in the input sequence, d represents the dimension of the hidden layer, PE _(pos,2i) And PE (polyethylene) _(pos,2i+1) Values representing even and odd bits, respectively, in the encoded vector of the input sequence at pos locations;

32 Embedding representation of input sequence through a linear mapping

h _t ＝f(h _t-T ,h _t-T+1 ,…,h _t-1 )+ε, (4)

wherein h is _t Represents the current observed value, h _t-T:t-1 ＝h _t-T ,h _t-T+1 ,…,h _t-1 Representing the first T observed values, T representing the hysteresis order of the model, epsilon representing random noise or residual errors which cannot be explained in the model, and f representing a twin autoregressive neural network; the embedding layer is formed by combining the original time sequence X epsilon R ^M×N Mapping to a low-dimensional vector space h E R ^d×N When the input sequence length M is larger, a smaller hidden layer dimension d is set to reduce the calculation amount, and the expression is as follows:

h＝W _h X+b _h (5)

wherein W is _h And b _h Representing two sub-network sharing parameters;

h _PE ＝h _(pos,i) +PE _(pos,i) (6)

33 Feature extraction by two independent encoders

The encoder first uses a self-attention mechanism for each time step of the inputThe conversion is performed with the following expression:

FFN(h′)＝W ₂ ReLU(W ₁ h′+b ₁ )+b ₂ (9)

wherein h' represents the output of the previous layer, W ₁ 、W ₂ 、b ₁ 、b ₂ Respectively representing parameters of a forward layer, and carrying out residual connection and normalization processing;

34 Decoding by two independent decoders

The decoder is responsible for decoding the output of the encoder of step 33) and reconstructing the original input sequence into a new target sequence; the method comprises the steps of utilizing a shielding self-attention network to convert input data, for a t-th time step, only the first t-1 positions can be used as input, the t-th positions and the later positions are shielded, the information of the first t-1 positions is only used when the t-th positions are generated, and an attention weight matrix A is given, and the expression of shielding operation is as follows:

wherein A is _ij The attention weights of the position i and the position j are represented, and after the decoder masks self-attention to transform input data, the decoder performs decoding operation through the same network structure as the encoder;

The encoder and decoder of two sub-networks are obtained by alternately stacking a multi-head attention layer and a nonlinear layer, the sub-networks reconstruct the input time series data through an autoregressive mechanism, each sub-network predicts the value of the next time step according to the generated partial sequence and takes the value as input to continue to generate the prediction of the next time step until the complete sequence reconstruction is completed; using a linear layer to output the characteristics h of the decoder ^out ∈R ^d×N Is reconstructed into

5. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific steps of the step 4) are as follows:

two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, and calculating a reconstruction error through a mean square error loss function, wherein the representation of the mean square error loss function is as follows:

the total reconstruction error is the average of the sum of the reconstruction errors of all samples in the two subnetworks, expressed as follows:

6. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific steps of the step 5) are as follows:

wherein MaxPool (·) represents maximum pooling, avePool (·) represents average pooling, for feature e of time step t ^t Carrying out average pooling, fusing the characteristics of the moment, carrying out maximized pooling on all the moments, and extracting the most specific representation;

OD(X)＝Score ₁ (X)+Score ₂ (X)(15)

7. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:

the specific steps of the step 6) are as follows: