CN117056874A - Unsupervised electricity larceny detection method based on deep twin autoregressive network - Google Patents

Unsupervised electricity larceny detection method based on deep twin autoregressive network Download PDF

Info

Publication number
CN117056874A
CN117056874A CN202311040028.3A CN202311040028A CN117056874A CN 117056874 A CN117056874 A CN 117056874A CN 202311040028 A CN202311040028 A CN 202311040028A CN 117056874 A CN117056874 A CN 117056874A
Authority
CN
China
Prior art keywords
time
follows
data
sequence
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311040028.3A
Other languages
Chinese (zh)
Inventor
李琪林
彭德中
周尧
彭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marketing Service Center Of State Grid Sichuan Electric Power Co
Original Assignee
Marketing Service Center Of State Grid Sichuan Electric Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marketing Service Center Of State Grid Sichuan Electric Power Co filed Critical Marketing Service Center Of State Grid Sichuan Electric Power Co
Priority to CN202311040028.3A priority Critical patent/CN117056874A/en
Publication of CN117056874A publication Critical patent/CN117056874A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an unsupervised electricity stealing detection method based on a deep twin autoregressive network, which specifically comprises the following steps: preprocessing the original data; acquiring a subsequence through a sliding window; reconstructing the time sequence through a depth twin autoregressive network; calculating a reconstruction error of the input sequence at each moment; calculating the sample outlier; and judging abnormal users in the electricity consumption data through threshold comparison. Reconstructing unmarked input data through two twin autoregressive subnetworks respectively and independently, and optimizing model parameters through iterative training by using normal samples in reconstructed error prediction data, thereby avoiding introducing extra noise; the multi-head self-attention mechanism is utilized to capture complex characteristics such as time dependence, periodicity, randomness and the like in the electricity utilization data, and the problem of insufficient extraction of relevant characteristics in the data by the existing detection method is solved by reconstructing effective representation of normal sample learning data, so that the accuracy rate of electricity stealing behavior detection is improved.

Description

Unsupervised electricity larceny detection method based on deep twin autoregressive network
Technical Field
The application belongs to the technical field of electric power data analysis, and particularly relates to an unsupervised electricity larceny detection method based on a deep twin autoregressive network.
Background
Electricity theft detection, also known as abnormal electricity usage detection, is a sub-field of time series anomaly detection aimed at identifying electricity usage that does not conform to the normal electricity usage laws or violates electricity usage contracts. In the electric power system, the electricity stealing behavior not only can cause a large amount of electric power loss and economic loss, but also increases the risk of electricity utilization safety accidents, so that the electricity stealing detection has important significance in the aspects of timely finding and correcting unreasonable electricity utilization behavior of users, reducing energy loss, improving electricity utilization safety and the like.
The traditional abnormal electricity behavior detection method is mainly realized by manpower, the bypassed transmission line is checked by manually checking and comparing abnormal electricity meter readings, the method is time-consuming and labor-consuming, has high labor cost and lower efficiency, and an inspector needs to have professional knowledge in the related field to make correct judgment. Along with the laying of a large number of intelligent hardware devices in the intelligent power grid, a large number of high-dimensional power use data are generated in the power system, so that analysis and detection of the collected power data become possible, a large number of detection algorithms based on machine learning and deep learning are widely applied, a large number of manual marking data are usually required by the methods, abnormal samples are usually very rare, manual marking is often difficult, and a detection method based on supervised learning is difficult to popularize in practical application in a large scale.
In recent years, a reconstruction-based unsupervised anomaly detection method has good detection effect in a plurality of application fields, and on one hand, the detection method introduces extra noise due to model training by acquiring a normal sample through a heuristic rule, so that detection accuracy is affected; on the other hand, due to the fact that the cyclic neural network or the variant thereof is adopted to model time information, long-term time dependency, periodicity and other complex time sequence characteristics are difficult to capture.
Disclosure of Invention
The application aims to provide an unsupervised electricity larceny detection method based on a deep twin autoregressive network, which solves the problem that the existing detection method is insufficient in extracting relevant features in data, and simultaneously effectively improves the accuracy of electricity larceny behavior detection.
In order to solve the technical problems, the application is realized by the following steps:
an unsupervised electricity larceny detection method based on a depth twin autoregressive network specifically comprises the following steps:
1) Preprocessing the original data;
2) Acquiring a subsequence through a sliding window;
3) Reconstructing the time sequence through a depth twin autoregressive network;
4) Calculating a reconstruction error of the input sequence at each moment;
5) Calculating the sample outlier;
6) And judging abnormal users in the electricity consumption data through threshold comparison.
Further, the specific method for preprocessing the raw data in the step 1) is as follows:
the time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) 1 ,s 2 ,…,s n )∈R n Wherein n is the length of S; in order to avoid introducing data deviation due to filling of missing values, the application marks the original data sample by adopting a group of binary masks on the basis of zero filling processing of the original data sample, namely the missing value data is marked as 0, and the non-missing value data is marked as 1, and the expression is as follows:
wherein Mask (·) represents a binary Mask processing function, naN represents a missing value, s t Representing the observed value of the time sequence S at the time t;
in order to avoid scale difference of different samples and different features, each feature is normalized, the normalization operation aiming at time steps is adopted, namely, each time step of each sample is processed, the average value of each sample is adjusted to 0, and the variance of each sample is kept unchanged, so that the time sequence characteristics of data are kept to the maximum extent.
Further, the specific method of the step 2) is as follows:
setting the step length of a sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, and converting the unitary time sequence into a multi-element time sequence, wherein the non-overlapping subsequences are X= { X 1 ,x 2 ,…,x N }, whereinSign->Representing the feature x of the rounding operation at time t (t.ltoreq.N) tThe t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:
x t ={s t×M ,s t×M+1 ,...,s t×M+M-1 }(2)
further, the specific steps of the step 3) are as follows:
31 Position encoding the input sequence;
32 -embedding the input sequence by a linear mapping;
33 Feature extraction by two independent encoders;
34 Decoding by two independent decoders;
35 The original input sequence is reconstructed by the reconstruction layer from the decoder output.
Specifically, the specific method of the step 31) is as follows:
the position information of the input sequence is characterized by position coding, the expression of the position coding vector PE is as follows:
wherein pos represents the position in the input sequence, d represents the dimension of the hidden layer, PE (pos,2i) And PE (polyethylene) (pos,2i+1) Representing the values of the even and odd bits, respectively, in the encoded vector of the input sequence at pos positions.
The specific method of the step 32) is as follows:
output h of each time step t All depending on the state h of the previous T time steps t-T:t-1 The expression formula is as follows:
h t =f(h t-T ,h t-T+1 ,…,h t-1 )+ε, (4)
wherein h is t Represents the current observed value, h t-T:t-1 =h t-T ,h t-T+1 ,…,h t-1 Representing the first T observations, T representing the hysteresis order of the model, ε representing random noise or residual errors in the model that cannot be explained;
f represents a twin autoregressive neural network comprising two sub-networks based on multi-head autoregressive, capturing the relevance and similarity of different features by linear projection through an embedding layer sharing parameters by transforming the original time sequence X E R M×N Mapping to a low-dimensional vector space h E R d×N The method comprises the steps of carrying out a first treatment on the surface of the When the input sequence length M is larger, a smaller hidden layer dimension d is set to reduce the calculation amount, and the expression is as follows:
h=W h X+b h (5)
wherein W is h And b h Representing two sub-network sharing parameters;
the position coding information of the data is integrated into the characteristics, and the expression is as follows:
h PE =h (pos,i) +PE (pos,i) (6)
wherein h is PE Indicating the features incorporating position coding, h (pos,i) Representing the ith bit of the embedded layer encoding vector at pos position;
the specific method of the step 33) is as follows:
constructing a twin autoregressive network by a multi-head attention-based encoder and decoder stack, the encoder first utilizing an autoregressive mechanism for each time step of the inputThe conversion is performed with the following expression:
wherein,represents a scaling factor, q=w q h PE +b q ,K=W k h PE +b k ,V=W v h PE +b v Respectively representing a query vector, a key vector and a value vector;
and fusing a plurality of self-attention layers by using a multi-head attention mechanism, so that the model focuses on different characterization subspace information from different positions together, wherein the expression is as follows:
wherein Concat (·) represents vector concatenation operation, W o Representation modelA profile parameter;
and then, introducing nonlinear information into the model by using two forward layers and ReLU activation, wherein the expression is as follows:
FFN(h′)=W 2 ReLU(W 1 h′+b 1 )+b 2 (9)
wherein h' represents the output of the previous layer, W 1 、W 2 、b 1 、b 2 Representing the parameters of the forward layer respectively, and then performing residual connection and normalization processing.
The specific method of the step 34) is as follows:
the decoder is responsible for decoding the output of the encoder of step 33) to reconstruct the original input sequence into a new target sequence;
firstly, converting input data by using a shielding self-attention network, for the t-th time step, only the first t-1 positions can be used as input, the t-th positions and the positions after the t-th positions are shielded, so that when the t-th positions are generated, only the information of the first t-1 positions is used, and an attention weight matrix A is given, and the expression of shielding operation is as follows:
wherein A is ij The attention weights of the positions i and j are represented, and after the decoder performs the transformation of the input data by masking the self-attention, the decoder performs the decoding operation by the same network structure as the encoder.
The specific method of the step 35) is as follows:
the encoder and decoder of two sub-networks are obtained by alternately stacking a multi-head attention layer and a nonlinear layer, the sub-networks reconstruct the input time series data through an autoregressive mechanism, each sub-network predicts the value of the next time step according to the generated partial sequence and takes the value as input to continue to generate the prediction of the next time step until the complete sequence reconstruction is completed; the application adopts a linear layer to output the characteristic h of the decoder out ∈R d×N Is reconstructed into
Wherein W is rec. ∈R M×d And b rec. ∈R 1×M Representing the reconstruction layer parameters.
Further, the specific steps of the step 4) are as follows:
two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, measuring the error between the predicted value and the true value through a mean square error loss function, and calculating a reconstruction error, wherein the mean square error loss function has the expression as follows:
where N represents the length of the time series, M represents the dimension of each time step feature,i-th feature representing the t-th time step of the input sequence X,/th feature representing the t-th time step of the input sequence X>And->Respectively representing the ith reconstructed characteristic of the two sub-networks at the t-th time step;
reconstructing the total error into a data setThe expression of the average of the sum of the reconstruction errors of all samples in the two sub-networks is as follows:
the larger the reconstruction error of one sample is, the higher the outlier degree is, the sample with the high outlier degree is regarded as a normal sample, and meanwhile, the intersection of the two sub-network output samples is input into the shared embedding layer in the step 3) again, and then iterative training is carried out. The training mode based on the autoregressive and the mean square error loss function enables the model to learn the modes and the characteristics in the time sequence data step by step, so that the reconstruction capability and the prediction accuracy of the data are improved.
Further, the specific steps of the step 5) are as follows:
after model training is completed, reconstructing a new sample through a twin autoregressive network to obtain a reconstruction error of each time stepAnd->The abnormal scoring condition of the new sample is obtained through pooling operation, and the expression is as follows:
wherein MaxPool (·) represents maximum pooling, avePool (·) represents average pooling, for feature e of time step t t Carrying out average pooling so as to fuse the characteristics of the moment, and carrying out maximized pooling on all moments so as to extract the most specific representation;
the outliers of sample X were obtained from the anomaly scores of the two autoregressive subnetworks and expressed as follows:
OD(X)=Score 1 (X)+Score 2 (X)(15)
further, the specific steps of the step 6) are as follows:
if the outlier OD (X) > τ of the sample X is determined as an outlier, the outlier of the whole samples in the data is compared with the threshold τ one by one to detect the outlier in the system.
Compared with the prior art, the application has the beneficial effects that:
firstly, preprocessing an input time sequence, including filling a missing value, normalizing and the like, and then extracting a series of non-overlapping subsequences from the original time sequence through a sliding window with the length of M, so that a unitary time sequence is converted into a multi-element time sequence; after preprocessing, time sequence feature extraction is carried out through a twin autoregressive neural network, a normal mode in data is captured, and an abnormal score of each sample is calculated. The twin autoregressive network comprises two multi-headed autoregressive-based subnetworks that share an embedded layer, have a similar encoder-decoder structure, and are parameter optimized by reconstruction loss. In the training stage, the abnormal scores of the samples are respectively output, and are ranked according to the scores, so that a plurality of samples with the highest abnormal degree are regarded as abnormal points. And in the testing stage, the outlier scores output by the two sub-networks are summed to obtain the outlier degree of the test sample, and finally, the outlier sample is judged through a threshold value.
The application reconstructs the unmarked input data through the two twin autoregressive subnetworks respectively and independently, and then combines the reconstruction errors of the two subnetworks to predict the normal sample in the data without manually marking the data, thereby optimizing the model parameters through iterative training and avoiding introducing extra noise; the multi-head self-attention mechanism is utilized to capture complex characteristics such as time dependence, periodicity, randomness and the like in the electricity utilization data, the effective representation of the data is learned by reconstructing the behavior pattern of the normal sample, the problem that the existing detection method is insufficient in extracting relevant characteristics in the data is solved, and the accuracy rate of detecting electricity stealing behaviors is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of the present application.
Detailed Description
The following describes the embodiments of the present application in further detail with reference to the drawings and specific examples.
As shown in fig. 1, an unsupervised electricity theft detection method based on a deep twin autoregressive network comprises the following steps:
1) Preprocessing the original data
The time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) 1 ,s 2 ,…,s n )∈R n Wherein n is the length of S; because of errors and negligence in the data measurement and recording processes, missing values may exist in real time series data, in order to avoid introducing data deviation due to filling of the missing values, the application marks the original data sample by adopting a group of binary masks on the basis of zero filling processing of the original data sample, namely, the missing value data is marked as 0, and the non-missing value data is marked as 1, and the expression is as follows:
wherein Mask (·) represents a binary Mask processing function, naN (not a number) represents a missing value, s t Representing the observed value of the time sequence S at the time t;
in order to avoid scale differences of different samples and different features, normalization processing is carried out on each feature, such as Z-score normalization, min-max normalization and the like, and the operations assume that the different features are mutually independent, so that time dependence among the features is destroyed; the application adopts normalization operation aiming at time steps, namely, each time step of each sample is processed, the average value of each sample is adjusted to 0, and the variance of each sample is kept unchanged, so that the time sequence characteristic of the data is kept to the maximum extent.
2) Sub-sequence acquisition through sliding window
In order to better represent the local mode and trend (such as ascending, descending, stable and the like) in the time sequence, setting the step length of the sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, converting the unitary time sequence into a multi-element time sequence, and searching the time mode and trend under different time scales by adjusting the size and the step length of the sliding window, so that the characteristics of the time sequence data can be more comprehensively analyzed.
The non-overlapping subsequence is x= { X 1 ,x 2 ,…,x N }, whereinSign->Features of time t (t.ltoreq.N) representing the rounding down operation>The t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:
x t ={s t×M ,s t×M+1 ,...,s t×M+M-1 } (2)
3) Reconstruction of time series through depth twin autoregressive network
31 Position coding of the input sequence
The user power consumption data is a typical time sequence, complex characteristics such as time sequence, periodicity, randomness and the like can be included among each time step, and the original input data can carry the position information by introducing the position coding information, so that a model learns the position characteristics in the data. The position information of the input sequence is characterized by position coding, the expression of the position coding vector PE is as follows:
wherein pos represents the position in the input sequence, d represents the dimension of the hidden layer, PE (pos,2i) And PE (polyethylene) (pos,2i+1) Representing the values of the even and odd bits, respectively, in the encoded vector of the input sequence at pos positions.
32 Embedding representation of input sequence through a linear mapping
The autoregressive model is a sequence generation model and describes the dependency relationship between the current observed value and the previous observed value in time sequence data, and the output h of each time step t All depending on the state h of the previous T time steps t-T:t-1 The expression formula is as follows:
h t =f(h t-T ,h t-T+1 ,…,h t-1 )+ε, (4)
wherein h is t Represents the current observed value, h t-T:t-1 =h t-T ,h t-T+1 ,…,h t-1 Representing the first T observations, T representing the hysteresis order of the model, ε representing random noise or residual errors in the model that cannot be explained;
f represents a twin autoregressive neural network comprising two sub-networks based on multi-head autoregressive, capturing the relevance and similarity of different features by linear projection through an embedding layer sharing parameters by transforming the original time sequence X E R M×N Mapping to a low-dimensional vector space h E R d×N The method comprises the steps of carrying out a first treatment on the surface of the When the input sequence length M is larger, a smaller hidden layer dimension d is set to reduce the calculation amount, and the expression is as follows:
h=W h X+b h (5)
wherein W is h And b h Representing two sub-network sharing parameters;
the position coding information of the data is integrated into the characteristics, and the expression is as follows:
h PE =h (pos,i) +PE (pos,i) (6)
wherein h is PE Indicating the addition of position codesCharacteristics of code, h (pos,i) Representing the ith bit of the embedded layer encoding vector at pos position;
33 Feature extraction by two independent encoders
Constructing a twin autoregressive network by a multi-head attention-based encoder and decoder stack, the encoder first utilizing an autoregressive mechanism for each time step of the inputThe conversion is performed with the following expression:
wherein,represents a scaling factor, q=w q h PE +b q ,K=W k h PE +b k ,V=W v h PE +b v Respectively representing a query vector, a key vector and a value vector; the similarity between Q and K is used to calculate the weight and then the weighted sum with V to obtain the output vector, the essence of the attention mechanism is to weight sum all time steps according to the calculated weight, it allows the time step information from any distance to flow directly to the current step, making the attention mechanism have the ability to capture long-term time dependence.
And fusing a plurality of self-attention layers by using a multi-head attention mechanism, so that the model focuses on different characterization subspace information from different positions together, wherein the expression is as follows:
wherein Concat (·) represents vector concatenation operation, W o Representing model parameters;
and then, introducing nonlinear information into the model by using two forward layers and ReLU activation, wherein the expression is as follows:
FFN(h′)=W 2 ReLU(W 1 h′+b 1 )+b 2 (9)
wherein h' represents the output of the previous layer, W 1 、W 2 、b 1 、b 2 Representing the parameters of the forward layer respectively, and then performing residual connection and normalization processing.
34 Decoding by two independent decoders
The decoder is responsible for decoding the output of the encoder of step 33) to reconstruct the original input sequence into a new target sequence; in the process of generating the sequence, the decoder depends on the previous time step and the state of the decoder itself, and all positions after the current time step need to be masked in order to avoid the prediction of the current time step from being influenced by the future time step.
Firstly, converting input data by using a shielding self-attention network, for the t-th time step, only the first t-1 positions can be used as input, the t-th positions and the positions after the t-th positions are shielded, so that when the t-th positions are generated, only the information of the first t-1 positions is used, and an attention weight matrix A is given, and the expression of shielding operation is as follows:
wherein A is ij The attention weights of the positions i and j are represented, and after the decoder performs the transformation of the input data by masking the self-attention, the decoder performs the decoding operation by the same network structure as the encoder.
35 Reconstructing the original input sequence by the output of the decoder through the reconstruction layer
The encoder and decoder of the two sub-networks are obtained by alternately stacking a multi-headed attention layer and a non-linear layer, the sub-networks reconstruct the input time-series data by means of an autoregressive mechanism, each sub-network predicts the value of the next time step from the partial sequence that has been generated and continues to generate as input the prediction of the next time step,until complete sequence reconstruction is completed; the application adopts a linear layer to output the characteristic h of the decoder out ∈R d×N Is reconstructed into
Wherein W is rec. ∈R M×d And b rec. ∈R 1×M Representing the reconstruction layer parameters.
4) Calculating the reconstruction error of the input sequence at each instant
Two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, measuring the error between the predicted value and the true value by a mean square error loss function (Mean Squared Error, MSE) to calculate a reconstruction error, wherein the representation of the mean square error loss function is as follows:
where N represents the length of the time series, M represents the dimension of each time step feature,i-th feature representing the t-th time step of the input sequence X,/th feature representing the t-th time step of the input sequence X>And->Representing two sub-networks respectivelyThe ith reconstructed feature at the t-th time step;
reconstructing the total error into a data setThe expression of the average of the sum of the reconstruction errors of all samples in the two sub-networks is as follows:
the reconstruction error of each sample corresponds to a score, the score reflects the difficulty level of the sample in reconstruction through an autoregressive model, and the normal mode in the time sequence can be learned through a training network because the normal sample is in most of the data set. The larger the reconstruction error of one sample is, the higher the outlier degree is, the sample with the high outlier degree is regarded as a normal sample, and meanwhile, the intersection of the two sub-network output samples is input into the shared embedding layer in the step 3) again, and then iterative training is carried out. The training mode based on the autoregressive and the mean square error loss function enables the model to learn the modes and the characteristics in the time sequence data step by step, so that the reconstruction capability and the prediction accuracy of the data are improved. Meanwhile, as the selection of the loss function is based on the mean square error, the model is more concerned about the overall reconstruction of the whole sequence in the training process, and not just the matching of local modes, so that the generalization capability and stability of the model are enhanced.
5) Calculating sample outliers
The sample outlier is used for measuring the possibility that one sample belongs to an outlier, and after model training is completed, a new sample is reconstructed through a twin autoregressive network to obtain a reconstruction error of each time stepAnd-> The abnormal scoring condition of the new sample is obtained through pooling operation, and the expression is as follows:
wherein MaxPool (·) represents maximum pooling, avePool (·) represents average pooling, for feature e of time step t t Carrying out average pooling so as to fuse the characteristics of the moment, and carrying out maximized pooling on all moments so as to extract the most specific representation;
the outliers of sample X were obtained from the anomaly scores of the two autoregressive subnetworks and expressed as follows:
OD(X)=Score 1 (X)+Score 2 (X)(15)
6) Determining abnormal users in electricity data by threshold comparison
If the outlier OD (X) > τ of the sample X is determined as an outlier, the outlier of the whole samples in the data is compared with the threshold τ one by one to detect the outlier in the system.
Specific examples:
an electricity consumption abnormality detection data set issued by a national electric network (SGCC) of China comprises electricity consumption of 42372 users within 1035 days from 1 month in 2014 to 10 months in 2016, wherein 3615 users have abnormal electricity consumption behaviors. There is a large amount of missing data in this dataset, and the statistics are shown in table 1.
Table 1SGCC dataset statistics
1) Raw data preprocessing
Firstly, extracting a mask of a missing value in an input sequence by using a formula (1), and then processing each time step of each sample to adjust the average value to 0. Taking the first sample in the data set of table 1 as an example, the raw data is as follows:
the generated missing value mask is as follows:
the average value of the sample is about 7.93615 after filling the missing value with 0, and then the sample is subjected to data normalization to obtain the following table:
2) Sub-sequence acquisition through sliding window
The input unitary time series is converted into a unitary time series through a sliding window using equation (2). Setting the sliding window length and the step size m=7, the samples are converted into a multivariate time series in units of weeks as follows:
3) Reconstruction of time series through depth twin autoregressive network
31 Position coding of the input sequence
Calculating the position coding vector of the time sequence by the formula (3), if the dimension d=16 of the hidden layer, the position coding of each time step is a 16-dimensional vector, taking time steps 0,1 and 2 as an example, and the position coding is as follows:
32 Embedding representation of input sequence through a linear mapping
The relevance and similarity between different features are captured by linear projection through an embedding layer sharing parameters, and then the position coding information of the data is merged into the features (formula 6). Embedded layer parameter W h And b h Can be obtained by iterative training calculation.
33 Feature extraction by two independent encoders
And encoding the data passing through the embedded layer by using formulas 7-9, and capturing the time dependence, periodicity, randomness and other characteristics in the time sequence data through a multi-head self-attention mechanism.
34 Decoding by two independent decoders;
the decoder is responsible for decoding the output of the encoder so as to reconstruct the original input sequence into a new target sequence, and each time step depends on the previous time step and the state of the decoder itself in the process of generating the sequence; calculate the weight score of the masked self-attention using equation 10 and input data for each time stepThe transformation is performed and the decoding operation is performed through the same network structure as the encoder.
35 Reconstructing the original input sequence by the output of the decoder through the reconstruction layer
The encoder and decoder of two sub-networks are obtained by alternately stacking a multi-headed note layer and a non-linear layer, reconstructing the input time-series data using equation 11, and re-expressing the input sequence x as two sub-networks, respectivelyAnd->
4) Calculating the reconstruction error of the input sequence at each instant
Two outputs of a twin autoregressive subnetworkAnd->And comparing the predicted value with the input sequence x respectively, masking the missing value mask, and finally calculating a reconstruction error between the predicted value and the true value through a formula 12 and a formula 13. The intersection of the samples output by the two sub-networks is input to the shared embedding layer again, and then iterative training is carried out to optimize each parameter of the twin autoregressive network model.
5) Calculating sample outliers
After model training is completed, reconstructing a new sample through a twin autoregressive network to obtain a reconstruction error of each time stepAnd->The anomaly scores for all samples were calculated by equations 14 and 15, taking the first 7 samples in the dataset of Table 1 as an example, and the outliers were calculated as follows:
6) Determining abnormal users in electricity data by threshold comparison
Comparing the outliers of the whole samples, outputting samples with outliers higher than the threshold τ as outliers, and determining sample 6 as an outlier if outlier threshold τ=1000 among the 7 samples calculated in step 5.
The present example evaluates the performance of the proposed algorithm by two evaluation metrics: AUC (Area Under the receiver operating characteristic Curve) and AP (Average Precision). AUC emphasizes class separability, i.e., whether the algorithm is able to distinguish well between different classes; while the AP emphasizes the integrity of the detection, i.e., whether the algorithm is able to retrieve more relevant samples. The larger the AUC value of a detection algorithm is, the better the detection performance is; higher AP scores represent higher detection accuracy and more positive examples can be retrieved.
Experiments herein were all performed on a server of NVIDIA GeForce RTX 3090GPU with a learning rate set to 0.0001, using Adam optimizer to update model parameters. The hidden layer dimension parameter of the proposed model is set to d=16, and the subsequence length is set to 7; the layers of the encoder and decoder of each of the two twin autoregressive subnetworks are the same, one subnetwork comprises a 4-layer autoregressive network and 8 groups of multi-head self-attention modules, and the number of layers and the number of multi-head self-attention modules of the other subnetwork are half of that of the other subnetwork.
According to the application, seven unsupervised anomaly detection methods are selected as comparison to evaluate the proposed method, the test scores of the comparison methods are shown in table 2, and experimental results show that the method provided by the application has remarkable advantages compared with a classical algorithm, and can more accurately identify and detect the abnormal electricity consumption behavior.
Table 2AUC score Table 2AUC scores of competing methods for experimental methods
The foregoing is merely illustrative of the embodiments of this application and it will be appreciated by those skilled in the art that variations may be made without departing from the principles of the application, and such modifications are intended to be within the scope of the application as defined in the claims.

Claims (7)

1. An unsupervised electricity larceny detection method based on a deep twin autoregressive network is characterized in that: the method specifically comprises the following steps:
1) Preprocessing the original data;
2) Acquiring a subsequence through a sliding window;
3) Reconstructing the time sequence through a depth twin autoregressive network;
4) Calculating a reconstruction error of the input sequence at each moment;
5) Calculating the sample outlier;
6) And judging abnormal users in the electricity consumption data through threshold comparison.
2. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific method for preprocessing the raw data in the step 1) is as follows:
the time sequence of the abnormal electricity utilization detection input is a time ordered set consisting of n variables, which is expressed as S= (S) 1 ,s 2 ,…,s n )∈R n Wherein n is the length of S; zero padding is carried out on an original data sample, a group of binary masks are adopted to mark the original data sample, the missing value data is marked as 0, the non-missing value data is marked as 1, and the expression is as follows:
wherein Mask (·) represents a binary Mask processing function, naN represents a missing value, s t Representing the observed value of the time sequence S at the time t;
and (3) performing normalization operation on time steps, processing each time step of each sample, adjusting the mean value of each time step to 0, and keeping the variance unchanged, so that the time sequence characteristic of the data is kept to the maximum.
3. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific method of the step 2) is as follows:
setting the step length of a sliding window as M, dividing the time sequence input in the step 1) into N non-overlapping subsequences with the length of M by the sliding window, and converting the unitary time sequence into a multi-element time sequence, wherein the non-overlapping subsequences are X= { X 1 ,x 2 ,…,x N }, whereinSign->Representing the rounding down operation, characteristic at time t +.> The t-th subsequence X in the converted multi-element time sequence X is an M-dimensional vector t Is obtained by selecting elements within a window range from an input time series; the t window has a starting index of t×M and an ending index of (t×M) + (M-1), expressed as follows:
x t ={s t×M ,s t×M+1 ,...,s t×M+M-1 } (2)
4. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific steps of the step 3) are as follows:
31 Position coding of the input sequence
The position information of the input sequence is characterized by position coding, and the expression of the position coding vector PE is as follows:
wherein pos represents the position in the input sequence, d represents the dimension of the hidden layer, PE (pos,2i) And PE (polyethylene) (pos,2i+1) Values representing even and odd bits, respectively, in the encoded vector of the input sequence at pos locations;
32 Embedding representation of input sequence through a linear mapping
Output h of each time step t All depending on the state h of the previous T time steps t-T:t-1 The expression formula is as follows:
h t =f(h t-T ,h t-T+1 ,…,h t-1 )+ε, (4)
wherein h is t Represents the current observed value, h t-T:t-1 =h t-T ,h t-T+1 ,…,h t-1 Representing the first T observed values, T representing the hysteresis order of the model, epsilon representing random noise or residual errors which cannot be explained in the model, and f representing a twin autoregressive neural network; the embedding layer is formed by combining the original time sequence X epsilon R M×N Mapping to a low-dimensional vector space h E R d×N When the input sequence length M is larger, a smaller hidden layer dimension d is set to reduce the calculation amount, and the expression is as follows:
h=W h X+b h (5)
wherein W is h And b h Representing two sub-network sharing parameters;
the position coding information of the data is integrated into the characteristics, and the expression is as follows:
h PE =h (pos,i) +PE (pos,i) (6)
wherein h is PE Indicating the features incorporating position coding, h (pos,i) Representing the ith bit of the embedded layer encoding vector at pos position;
33 Feature extraction by two independent encoders
The encoder first uses a self-attention mechanism for each time step of the inputThe conversion is performed with the following expression:
wherein,represents a scaling factor, q=w q h PE +b q ,K=W k h PE +b k ,V=W v h PE +b v Respectively representing a query vector, a key vector and a value vector;
and fusing a plurality of self-attention layers by using a multi-head attention mechanism, so that the model focuses on different characterization subspace information from different positions together, wherein the expression is as follows:
wherein Concat (·) represents vector concatenation operation, W o Representing model parameters;
and then, introducing nonlinear information into the model by using two forward layers and ReLU activation, wherein the expression is as follows:
FFN(h′)=W 2 ReLU(W 1 h′+b 1 )+b 2 (9)
wherein h' represents the output of the previous layer, W 1 、W 2 、b 1 、b 2 Respectively representing parameters of a forward layer, and carrying out residual connection and normalization processing;
34 Decoding by two independent decoders
The decoder is responsible for decoding the output of the encoder of step 33) and reconstructing the original input sequence into a new target sequence; the method comprises the steps of utilizing a shielding self-attention network to convert input data, for a t-th time step, only the first t-1 positions can be used as input, the t-th positions and the later positions are shielded, the information of the first t-1 positions is only used when the t-th positions are generated, and an attention weight matrix A is given, and the expression of shielding operation is as follows:
wherein A is ij The attention weights of the position i and the position j are represented, and after the decoder masks self-attention to transform input data, the decoder performs decoding operation through the same network structure as the encoder;
35 Reconstructing the original input sequence by the output of the decoder through the reconstruction layer
The encoder and decoder of two sub-networks are obtained by alternately stacking a multi-head attention layer and a nonlinear layer, the sub-networks reconstruct the input time series data through an autoregressive mechanism, each sub-network predicts the value of the next time step according to the generated partial sequence and takes the value as input to continue to generate the prediction of the next time step until the complete sequence reconstruction is completed; using a linear layer to output the characteristics h of the decoder out ∈R d×N Is reconstructed into
Wherein W is rec. ∈R M×d And b rec. ∈R 1×M Representing the reconstruction layer parameters.
5. The method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific steps of the step 4) are as follows:
two outputs of a twin autoregressive subnetworkAnd->Comparing the error with the input sequence X, masking the missing value, and calculating a reconstruction error through a mean square error loss function, wherein the representation of the mean square error loss function is as follows:
where N represents the length of the time series, M represents the dimension of each time step feature,i-th feature representing the t-th time step of the input sequence X,/th feature representing the t-th time step of the input sequence X>And->Respectively representing the ith reconstructed characteristic of the two sub-networks at the t-th time step;
the total reconstruction error is the average of the sum of the reconstruction errors of all samples in the two subnetworks, expressed as follows:
6. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific steps of the step 5) are as follows:
after model training is completed, reconstructing a new sample through a twin autoregressive network to obtain a reconstruction error of each time stepAnd->The abnormal scoring condition of the new sample is obtained through pooling operation, and the expression is as follows:
wherein MaxPool (·) represents maximum pooling, avePool (·) represents average pooling, for feature e of time step t t Carrying out average pooling, fusing the characteristics of the moment, carrying out maximized pooling on all the moments, and extracting the most specific representation;
the outliers of sample X were obtained from the anomaly scores of the two autoregressive subnetworks and expressed as follows:
OD(X)=Score 1 (X)+Score 2 (X)(15)
7. the method for unsupervised power theft detection based on deep twin autoregressive network as defined in claim 1, wherein:
the specific steps of the step 6) are as follows:
if the outlier OD (X) > τ of the sample X is determined as an outlier, the outlier of the whole samples in the data is compared with the threshold τ one by one to detect the outlier in the system.
CN202311040028.3A 2023-08-17 2023-08-17 Unsupervised electricity larceny detection method based on deep twin autoregressive network Pending CN117056874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311040028.3A CN117056874A (en) 2023-08-17 2023-08-17 Unsupervised electricity larceny detection method based on deep twin autoregressive network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311040028.3A CN117056874A (en) 2023-08-17 2023-08-17 Unsupervised electricity larceny detection method based on deep twin autoregressive network

Publications (1)

Publication Number Publication Date
CN117056874A true CN117056874A (en) 2023-11-14

Family

ID=88658486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311040028.3A Pending CN117056874A (en) 2023-08-17 2023-08-17 Unsupervised electricity larceny detection method based on deep twin autoregressive network

Country Status (1)

Country Link
CN (1) CN117056874A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117441980A (en) * 2023-12-20 2024-01-26 武汉纺织大学 Intelligent helmet system and method based on intelligent computation of multi-sensor information
CN117556311A (en) * 2024-01-11 2024-02-13 电子科技大学 Unsupervised time sequence anomaly detection method based on multidimensional feature fusion

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN113901990A (en) * 2021-09-15 2022-01-07 昆明理工大学 Case and news correlation analysis method for multi-view integrated learning
CN114359109A (en) * 2022-01-12 2022-04-15 西北工业大学 Twin network image denoising method, system, medium and device based on Transformer
CN114399066A (en) * 2022-01-15 2022-04-26 中国矿业大学(北京) Mechanical equipment predictability maintenance system and maintenance method based on weak supervision learning
CN114926746A (en) * 2022-05-25 2022-08-19 西北工业大学 SAR image change detection method based on multi-scale differential feature attention mechanism
WO2022240906A1 (en) * 2021-05-11 2022-11-17 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for edge-distributed storage and querying in value chain networks
CN115688035A (en) * 2022-10-19 2023-02-03 江苏电力信息技术有限公司 Time sequence power data anomaly detection method based on self-supervision learning
CN116206375A (en) * 2023-04-28 2023-06-02 南京信息工程大学 Face counterfeiting detection method based on double-layer twin network and sustainable learning
CN116467948A (en) * 2023-04-20 2023-07-21 江西科骏实业有限公司 Digital twin model mechanism and appearance combined parameter learning method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022240906A1 (en) * 2021-05-11 2022-11-17 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for edge-distributed storage and querying in value chain networks
CN113901990A (en) * 2021-09-15 2022-01-07 昆明理工大学 Case and news correlation analysis method for multi-view integrated learning
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN114359109A (en) * 2022-01-12 2022-04-15 西北工业大学 Twin network image denoising method, system, medium and device based on Transformer
CN114399066A (en) * 2022-01-15 2022-04-26 中国矿业大学(北京) Mechanical equipment predictability maintenance system and maintenance method based on weak supervision learning
CN114926746A (en) * 2022-05-25 2022-08-19 西北工业大学 SAR image change detection method based on multi-scale differential feature attention mechanism
CN115688035A (en) * 2022-10-19 2023-02-03 江苏电力信息技术有限公司 Time sequence power data anomaly detection method based on self-supervision learning
CN116467948A (en) * 2023-04-20 2023-07-21 江西科骏实业有限公司 Digital twin model mechanism and appearance combined parameter learning method
CN116206375A (en) * 2023-04-28 2023-06-02 南京信息工程大学 Face counterfeiting detection method based on double-layer twin network and sustainable learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117441980A (en) * 2023-12-20 2024-01-26 武汉纺织大学 Intelligent helmet system and method based on intelligent computation of multi-sensor information
CN117441980B (en) * 2023-12-20 2024-03-22 武汉纺织大学 Intelligent helmet system and method based on intelligent computation of multi-sensor information
CN117556311A (en) * 2024-01-11 2024-02-13 电子科技大学 Unsupervised time sequence anomaly detection method based on multidimensional feature fusion
CN117556311B (en) * 2024-01-11 2024-03-19 电子科技大学 Unsupervised time sequence anomaly detection method based on multidimensional feature fusion

Similar Documents

Publication Publication Date Title
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN117056874A (en) Unsupervised electricity larceny detection method based on deep twin autoregressive network
Jiang et al. A multi-step progressive fault diagnosis method for rolling element bearing based on energy entropy theory and hybrid ensemble auto-encoder
CN115018021B (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
CN110543860B (en) Mechanical fault diagnosis method and system based on TJM (machine learning model) transfer learning
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN113052271B (en) Biological fermentation data prediction method based on deep neural network
CN110570030A (en) Wind power cluster power interval prediction method and system based on deep learning
CN112147432A (en) BiLSTM module based on attention mechanism, transformer state diagnosis method and system
CN116796272A (en) Method for detecting multivariate time sequence abnormality based on transducer
CN112507479B (en) Oil drilling machine health state assessment method based on manifold learning and softmax
CN113378971A (en) Near infrared spectrum classification model training method and system and classification method and system
CN116050621A (en) Multi-head self-attention offshore wind power ultra-short-time power prediction method integrating lifting mode
CN115271225A (en) Wind power-wind power modeling method based on wavelet denoising and neural network
CN117648215B (en) Abnormal tracing method and system for electricity consumption information acquisition system
CN116796275A (en) Multi-mode time sequence anomaly detection method for industrial equipment
CN115146700A (en) Runoff prediction method based on Transformer sequence-to-sequence model
CN113159192A (en) Multi-element time sequence retrieval method and system
CN115688982B (en) Building photovoltaic data complement method based on WGAN and whale optimization algorithm
CN116050652A (en) Runoff prediction method based on local attention enhancement model
CN116383747A (en) Anomaly detection method for generating countermeasure network based on multi-time scale depth convolution
CN116167008A (en) Abnormal positioning method for internet of things sensing cloud data center based on data enhancement
CN116318773A (en) Countermeasure training type unsupervised intrusion detection system and method based on AE model optimization
CN112735604B (en) Novel coronavirus classification method based on deep learning algorithm
CN116361640A (en) Multi-variable time sequence anomaly detection method based on hierarchical attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination