CN113157771A - Data anomaly detection method and power grid data anomaly detection method - Google Patents
Data anomaly detection method and power grid data anomaly detection method Download PDFInfo
- Publication number
- CN113157771A CN113157771A CN202110459689.4A CN202110459689A CN113157771A CN 113157771 A CN113157771 A CN 113157771A CN 202110459689 A CN202110459689 A CN 202110459689A CN 113157771 A CN113157771 A CN 113157771A
- Authority
- CN
- China
- Prior art keywords
- generator
- data
- iteration
- generators
- discriminator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a data anomaly detection method, which comprises the steps of generating an initial model; using a multi-scale generator; training an initial model; generating a loss function; and (6) carrying out abnormity detection. The method extracts data information of a time sequence through the sliding window, and uses the cavity convolution to improve the accuracy of the model and the generalization capability of the model. The invention also provides a power grid data anomaly detection method. The present invention utilizes multiple generators and a single discriminator to alleviate the pattern collapse problem. Each generator contains a convolutional neural network of different sizes to obtain fine-grained and coarse-grained information of the time series. The generator comprises a Transformer module which is used for processing time sequence data so as to improve the precision; the present invention uses an attention mechanism to balance these generators, being able to better accommodate the data currently in use. Therefore, the method can effectively solve the problems of low precision, poor generalization capability and the like in the flow data anomaly detection.
Description
Technical Field
The invention belongs to the field of big data processing, and particularly relates to a data anomaly detection method and a power grid data anomaly detection method.
Background
The data anomaly detection method can save manpower and material resources by carrying out anomaly detection on big data, so that the problem is more and more generally solved by adopting the data anomaly detection method. Data anomaly detection methods typically employ time series anomaly detection. A time series anomaly refers to an observation that is particularly different from other observations in a particular time series. The anomaly detection method plays an important role in the fields of extreme weather or climate detection, network intrusion detection, chemical engineering fault diagnosis, power grid fault diagnosis and the like. For example, in extreme weather, quantitative indexes such as wind direction, wind speed and precipitation have different degrees of abnormality, and the abnormality detection utilizes a model to predict the extreme weather which may occur; in chemical engineering faults, the reading indexes of all valves are possibly abnormal; in network intrusion, abnormal detection can timely discover abnormal access and control operation; in grid fault diagnosis, a power fault may be discovered from an abnormality of power-related data. If the abnormality cannot be found in time, economic loss and even casualties can be caused.
However, time-series tags are often too difficult to obtain or too expensive. Over the last several decades, many researchers have been working on detecting anomalies in time series. Some of the earlier methods attempted to build a mathematical model that fits perfectly into the given data and treated outliers as anomalies. These methods distinguish between normal and abnormal samples by measuring the distance between each sample or the density of each point. Therefore, in order to obtain good experimental results, it is necessary to find a model that can perfectly fit real data, but when the situation is complicated and the data is affected by various factors, it is difficult to describe the data in the real world using a single model.
Disclosure of Invention
One objective of the present invention is to provide a data anomaly detection method, which uses a hole convolution and a Transformer to generate counter-streaming data for anomaly detection of data, so as to improve the accuracy and the generalization of the anomaly detection method.
The invention also provides a power grid data anomaly detection method.
The data anomaly detection method provided by the invention comprises the following steps:
s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network;
s2, using a multi-scale generator; performing feature extraction on the data from a plurality of angles, so that the multi-scale generator has generalization capability of performing feature extraction on different scale information;
s3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights;
s4, generating a loss function, and adding a gradient punishment mechanism into the loss function to generate a final model;
and S5, carrying out anomaly detection by using the generated final model.
Step S1 is specific to the generator GiLet us orderTo point to generator GiThe (2) th convolution of (a),a convolution kernel after the hole; convolution kernel size ofThe size of the hole convolution kernel isWith a stride ofA void rate ofFilling piThe relationship between them satisfies the formula:
if x refers to the sequence x with padding, then the 3 rd element of the result is scattered as:
wherein, DialateConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1); m represents the mth value of the hole convolution result,is the void fraction;is the hole convolution kernel size;is the input of the hole convolution;is the hole convolution kernel size.
Adding a Transformer-based network; the Transformer-based network core part is a self-attention technology, and self-attention maps a query Q and a group of key value pairs K and V to one output; for generator G with jth convolution kerneliAnd at the same time,where ω is the number of detection windows and d is the number of dimensions potentially represented in the transform; at a given time stamp t, the following equation holds:
wherein the f-function is a set of linear projections;for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiA value of self-attention of;for detecting the window DiThe sequence detected at time t; DialatedConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1);
the self-attention block is formed as:
wherein Att (-) is a self-attention block;is length ofAll one-dimensional vectors of (a);for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiThe value of self-attention.
Step S2 is to weight several generators of several scales for improving generalization performance of the generators on data, and propose GAN with multi-scale generators.
The weighting modes of the generators of the plurality of scales specifically include:
A1. building q generators, each GiComposed of a DCT framework and a set of linear projections, from a detection window DiObtaining information, wherein i is more than or equal to 1 and less than or equal to q;
A2. integrating generators together according to importance and different weights to generate pseudo data;
A3. the integrated dummy data is compared with the real data using a single discriminator D.
Step S3 is specifically to use a generated countermeasure network based on the hole convolution and the transform module as a training model, and includes the following steps:
B1. extracting original data and preprocessing the original data;
B2. updating the weight of the generator;
B3. training the generator and the discriminator, and updating the generator and the discriminator.
Step B1 is specifically defined as defining the time sequence X as one containing lXA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator GiUsing a detection window DiE.g., D, the raw data is cut into subsequences and then sent to the model for training.
Step B2 is specifically to assign a dynamic weight to each generator by using an attention mechanism; the method of calculating the weights is as follows:
Wherein the content of the first and second substances,is the loss of true samples in iteration B-1;
Wherein the content of the first and second substances,is the loss of dummy samples in iteration B-1.
Step B3 is specifically that when the confrontation model is generated by training, the discriminator D and the generator G are alternately updated; d is the dimension of the potential representation in the Transformer, and in iteration B, discriminator D is updated based on the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. h) is a generator before update;
a training generator, updating generator G according to the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
The loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function includes the loss and gradient penalty of GAN itself; the loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated data and authentic data; the gradient penalty is used to implement the Lipschitz constraint; in the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:
wherein the content of the first and second substances,m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
The invention also discloses a power grid data anomaly detection method based on the data anomaly detection method, which comprises the following steps:
C1. extracting power grid data;
C2. generating a final model from the power grid data through steps S1-S4;
C3. and carrying out anomaly detection on the power grid data by using the generated final model.
According to the data anomaly detection method and the power grid data anomaly detection method, the data information of a time sequence is extracted by using the sliding window, and the cavity convolution is used, so that the accuracy of the model is improved, and the generalization capability of the model is improved. The present invention utilizes multiple generators and a single discriminator to alleviate the pattern collapse problem. Each generator contains a convolutional neural network of different sizes to obtain fine-grained and coarse-grained information of the time series. Meanwhile, the generator also comprises a Transformer module which is used for processing time sequence data so as to improve the precision; at the same time, an attention mechanism is also used to balance the producers so that they can better adapt to the data currently in use. Therefore, the method can effectively solve the problems of low precision, poor generalization capability and the like in the flow data anomaly detection.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is a schematic diagram of data extraction based on a sliding window according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of feature extraction based on a sliding window according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of generating a countermeasure network model according to an embodiment of the invention.
Fig. 5 is a schematic diagram of an anomaly detection network model according to an embodiment of the present invention.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the data anomaly detection method provided by the invention comprises the following steps:
s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network; the hole convolution refers to a convolution neural network with a convolution kernel comprising fixed intervals; the Transformer module refers to a Transformer generator module (comprising a multi-head attention module, a feedforward neural network and a residual connection) defined in an original Transformer model;
s2, using a multi-scale generator; performing feature extraction on data from a plurality of angles including coarse granularity, fine granularity and the like, so that the multi-scale generator has generalization capability of performing feature extraction on information of different scales;
s3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights;
s4, generating a loss function, and adding a gradient punishment mechanism into the loss function to ensure that mode collapse is not easy to occur; generating a final model; the gradient penalty mechanism refers to a 1-GP gradient penalty term used in WGAN;
and S5, carrying out anomaly detection by using the generated final model.
The data anomaly detection method is based on the time sequence data anomaly detection of GAN, and the Transformer is a technology for updating the matrix only by using the interrelation of the information in the matrix; FIG. 2 is a schematic diagram of data extraction based on sliding window according to an embodiment of the present invention; fig. 3 is a schematic diagram of feature extraction based on a sliding window according to an embodiment of the present invention.
The model generated in step S1 is a transform model based on the hole convolution, and constitutes the main structure of the generator and the discriminator in the generation countermeasure network; although transformers are powerful tools for processing text sequences, transformers still have difficulty obtaining information directly from the underlying space. Therefore, we propose a hole convolution transform (DCT) architecture to solve this problem. The DCT architecture comprises multi-scale feature extraction and a Transformer-based network; wherein, multi-scale hole CNN is adopted to extract multi-scale features. The hole CNN can be used in semantic segmentation, can expand a receiving domain without reducing resolution, and is used to acquire multi-scale information of a detection window. Since the current data is a single time series, one-dimensional hole CNN is selected to process the data. Meanwhile, in order to ensure that the generators acquire the same amount of information at the same time, the CNN network of multiple holes between each generator maintains a stable reception field, and this method encourages the generators to acquire the same amount of information at the same time. To match the Transformer's network, the detection windows are filled such that the detection window outputs are equal in size.
In particular for the generator GiLet us orderTo point to generator GiThe (2) th convolution of (a),the convolution kernel after the hole. Convolution kernel size ofThe size of the hole convolution kernel isWith a stride ofA void rate ofFilling piThe relationship between them satisfies the formula:and
formally, assuming x refers to the sequence x with padding, the 3 rd element of the result is scattered as:
wherein, DialateConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1); m represents the mth value of the hole convolution result,is the void fraction;is the hole convolution kernel size;is the input of the hole convolution;is the hole convolution kernel size.
Although the CNN of a hole can capture multi-scale features from a single inspection window, it is not good at processing sequence data. Thus, a transform-based network is added. The Transformer based core of the network is a self-attention technology. Self-attention maps a query Q and a set of key-value pairs K and V to one output. For generator G with jth convolution kerneliAt the same timeWhere ω is the number of detection windows and d is the number of dimensions potentially represented in the transform; at a given time stamp t, the following equation holds:
wherein, DialateConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1); the f function is a set of linear projections;for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiA value of self-attention of;for detecting the window DiThe sequence detected at time t;
the self-attention block is formed as:
wherein Att (-) is a self-attention block,is length ofAll one-dimensional vectors of (a);for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiThe value of self-attention.
Step S2 is specifically to weight a plurality of generators of a plurality of scales for improving generalization performance of the generators on data. The DCT architecture has multi-scale feature extraction and time series processing capabilities, and can learn from the detection window, but there is a conflict: if the window is to be detectedIs set to a smaller value, the mode may become disabled due to limited information received while satisfying the long-term context anomaly; if the window is to be detectedIs set to a large value, the model may get too much useless information, resulting in the model suffering from low accuracy or low efficiency. Therefore, GAN with multi-scale generators is proposed.
The weighting modes of the generators of the scales specifically include:
A1. building q generators, each Gi(1 ≦ i ≦ q) consisting of a DCT framework and a set of linear projections, and from the detection window DiTo obtain information;
A2. integrating generators together according to importance and different weights to generate pseudo data;
A3. the integrated dummy data is compared with the real data using a single discriminator D.
Step S3 is specifically to use the generated confrontation network based on the hole convolution and the transform module as a training model, as shown in fig. 4, which is a schematic diagram of the generated confrontation network model according to the embodiment of the present invention. The method comprises the following steps:
B1. extracting original data and preprocessing the original data;
B2. updating the weight of the generator;
B3. training the generator and the discriminator, and updating the generator and the discriminator.
Step B1 is specifically defined as defining the time sequence X as one containing lXA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator GiUsing a detection window DiE.g., D, the raw data is cut into subsequences and then sent to the model for training.
Step B2 is specifically that, since the generators have their own preferences for different kinds of anomalies, weighting the generators with a fixed weight cannot achieve a good effect, and therefore, an attention mechanism is employed to assign a dynamic weight to each generator, that is, a loss value during training is taken as a basis for the importance of a specific generator;
suppose that in one iteration B-1, a generator G is useduAnd use generator GvIn contrast, the discriminator loss (BCE loss) is greater, and the generator G is considered to be either the true sample or the generated data for the sample passed to the generatorvIs more applicable to the current data because in iteration B-1, generator G isvMore accurate judgment is provided for the data. Therefore, generator G is added in the next iteration BvThe weight of (c).
The method of calculating the weights is as follows: defining initial weightsComputing generator G in iteration B from real samplesiWeight of (2)
Wherein the content of the first and second substances,is the loss of true samples in iteration B-1;
Wherein the content of the first and second substances,is the loss of dummy samples in iteration B-1.
Step B3 is embodied as alternately updating discriminator D and generator G as the initial GAN is trained to generate the confrontational model. In iteration B, discriminator D is updated based on the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. h) is a generator before update;
a training generator, updating generator G according to the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
The loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function mainly comprises two parts: loss of GAN itself and gradient penalty. The loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated and true data, while the gradient penalty is used to implement the Lipschitz constraint and make the model easier to converge. In the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:
wherein the content of the first and second substances,m is the maximum iteration number during DCT-GAN training; q is the number of generators; (ii) aFor generator G in iteration BiThe weight of (c);true data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
In a specific embodiment, the algorithm 1 for training the model is as follows:
inputting: x is a time sequence; d is a group of detection windows; b is the batch size when training DCT-GAN; m is the maximum iteration number; q is the number of generators;
and (3) outputting: g is a group of trained generators; d is a group of trained discriminators;
for B=1,B<M,B++do;
for i=1,i<M,i++do;
At each generator GiPrevious iteration of the above-computed true dataAnd a previous iteration of dummy dataTraining loss of (BCE loss);
end for
calculating LDGAnd by minimizing LDGTo update the discriminator;
for i=1,i<q,i++do 11;
update generator Gi;
end for
end for
One implementation of step S5 is as follows:
and (1) detecting the structure of the model.
Because the number of abnormal samples is small, all data are directly used, and a GAN structure is used for training, so that a pseudo sample close to real normal data can be generated. Thus, the present invention uses one of the multiple generators to reconstruct the actual data; in a specific embodiment, the generator is selected according to the length of the sliding window required, in this embodiment, the generator G is selectedaAnd (6) carrying out abnormity detection. Due to the generator GaDerived from a trained model, is a fixed structure, so that the proper hidden variable z is found in the hidden space by back-propagating the hidden variable z*So as to generate a variable Ga(z) is more similar to the real sample, and fig. 5 is a schematic diagram of the anomaly detection network model according to the embodiment of the present invention.
And (2) loss functions and algorithms in the abnormal detection stage.
The loss function of anomaly detection is mainly composed of two parts: i.e. the loss between real data and dummy data, and the loss between features extracted from real data and features extracted from dummy data with a discriminator. The complete loss function can be expressed by the following formula:
In a specific embodiment, the anomaly detection is shown in the following algorithm 2:
inputting: t is the iteration number;assigning parameters to the weights in the loss function; e is an abnormality detection sequence; swIs the detection window size; gaGenerating G for a trained generator associated with a detection windowaA priori probability of compatible noise; eta is the abnormality rate;
and (3) outputting: a list of anomalies detected in E, A;
for i=1,i<T,i++do
training potential space z using E
if i==T then
Storing a trained latent space za=z
end if
end if
Then calculate LTSADRank the sequence E according to the loss, select the largest lE·ηAnd sampling to form an exception list A.
In a specific embodiment, the faults of the power grid caused by the power transmission and transformation equipment are as follows:
external force damage of the power transmission line comprises crane tower collision and line collision, short circuit caused by lightning stroke, short circuit caused by line loss of branches, kites and the like, pollution flashover caused by severe weather and the like, and line tripping or accidents, equipment damage and the like can be caused.
The power transformation equipment is damaged or trip-out protected due to overhigh temperature, reduced insulating performance, overweight load, over-standard electrical parameters, mechanical reasons and the like, and further power grid accidents can be caused.
In order to detect data abnormity of the power grid, the method comprises the following steps:
C1. extracting power grid data;
C2. generating a final model from the power grid data through steps S1-S4;
C3. and carrying out anomaly detection on the power grid data by using the generated final model.
In the specific implementation:
s0. extracting the power grid data;
s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network;
s2, using a multi-scale generator; performing feature extraction on the data from a plurality of angles, so that the multi-scale generator has generalization capability of performing feature extraction on different scale information; for the embodiment of the power grid, the method can extract related parameters such as local load, voltage and frequency in the power transmission and transformation processes of the power grid from multiple angles such as rationality of an operation mode and overload rate of a line, and form multi-scale measurement aiming at the fault occurrence conditions of the power transformation equipment every week, every month, every quarter and every year, so that a multi-scale generator is formed, and the corresponding multi-scale generator has generalization capability of extracting the characteristics of the related parameters such as the local load, the voltage and the frequency in the power transmission and transformation processes of the power grid.
S3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights; dynamically adjusting the weights of multi-scale generators formed by weekly, monthly, quarterly and yearly fault occurrence of the power transformation equipment by using an attention mechanism in each iteration so that the multi-scale generators have different weights;
and performing multiple experiments on the extracted data characteristics in the plurality of multi-scale generator pairs, thereby dynamically adjusting different weights of each multi-scale generator to reach an optimal state.
S4, generating a loss function, and adding a gradient punishment mechanism into the loss function to generate a final model;
and S5, carrying out anomaly detection by using the generated final model.
Step S1 is specific to the generator GiLet us orderTo point to generator GiThe (2) th convolution of (a),a convolution kernel after the hole; convolution kernel size ofThe size of the hole convolution kernel isWith a stride ofA void rate ofFilling piThe relationship between them satisfies the formula:
if x refers to the sequence x with padding, then the 3 rd element of the result is scattered as:
wherein, DialateConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1); m represents the mth value of the hole convolution result,is the void fraction;is the hole convolution kernel size;is the input of the hole convolution;is the hole convolution kernel size.
Adding a Transformer-based network; the Transformer-based network core part is a self-attention technology, and self-attention maps a query Q and a group of key value pairs K and V to one output; for generator G with jth convolution kerneliAnd at the same time,where ω is the number of detection windows and d is the number of dimensions potentially represented in the transform; at a given time stamp t, the following equation holds:
wherein the f-function is a set of linear projections;for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiA value of self-attention of;for detecting the window DiThe sequence detected at time t; DialatedConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1);
the self-attention block is formed as:
wherein Att (-) is a self-attention block;is length ofAll one-dimensional vectors of (a);for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiThe value of self-attention.
Step S2 is to weight several generators of several scales for improving generalization performance of the generators on data, and propose GAN with multi-scale generators.
The weighting modes of the generators of the plurality of scales specifically include:
A1. building q generators, each GiComposed of a DCT framework and a set of linear projections, from a detection window DiObtaining information, wherein i is more than or equal to 1 and less than or equal to q;
A2. integrating generators together according to importance and different weights to generate pseudo data;
A3. the integrated dummy data is compared with the real data using a single discriminator D.
Step S3 is specifically to use a generated countermeasure network based on the hole convolution and the transform module as a training model, and includes the following steps:
B1. extracting original data and preprocessing the original data;
B2. updating the weight of the generator;
B3. training the generator and the discriminator, and updating the generator and the discriminator.
Step B1 is specifically defined as defining the time sequence X as one containing lXA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator GiUsing a detection window DiE.g., D, the raw data is cut into subsequences and then sent to the model for training.
Step B2 is specifically to assign a dynamic weight to each generator by using an attention mechanism; the method of calculating the weights is as follows:
Wherein the content of the first and second substances,is the loss of true samples in iteration B-1;
Wherein the content of the first and second substances,to be in iterationLoss of dummy samples in B-1.
Step B3 is specifically that when the confrontation model is generated by training, the discriminator D and the generator G are alternately updated; d is the dimension of the potential representation in the Transformer, and in iteration B, discriminator D is updated based on the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. h) is a generator before update;
a training generator, updating generator G according to the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); z is a radical of(B) Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
The loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function includes the loss and gradient penalty of GAN itself; the loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated data and authentic data; the gradient penalty is used to implement the Lipschitz constraint; in the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:
wherein the content of the first and second substances,m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); x: (B) True data used in the B-th iteration; z: (B) Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
Claims (10)
1. A data anomaly detection method is characterized by comprising the following steps:
s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network;
s2, using a multi-scale generator; performing feature extraction on the data from a plurality of angles, so that the multi-scale generator has generalization capability of performing feature extraction on different scale information;
s3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights;
s4, generating a loss function, and adding a gradient punishment mechanism into the loss function to generate a final model;
and S5, carrying out anomaly detection by using the generated final model.
2. Method for detecting data anomalies according to claim 1, characterized in that step S1 is specific to a generator GiLet us orderTo point to generator GiThe (2) th convolution of (a),a convolution kernel after the hole; convolution kernel size ofThe size of the hole convolution kernel isWith a stride ofVoid rate of ri jFilling in piThe relationship between them satisfies the formula:
if x refers to the sequence x with padding, then the 3 rd element of the result is scattered as:
wherein, DialateConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1); m represents the mth value of the result of the hole convolution, ri jIs the void fraction;is the hole convolution kernel size; is the input of the hole convolution;is the hole convolution kernel size.
Adding a Transformer-based network; the Transformer-based network core part is a self-attention technology, and self-attention maps a query Q and a group of key value pairs K and V to one output; for generator G with jth convolution kerneliAnd at the same time,where ω is the number of detection windows and d is the number of dimensions potentially represented in the transform; at a given time stamp t, the following equation holds:
wherein the f-function is a set of linear projections;for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiA value of self-attention of;for detecting the window DiThe sequence detected at time t; DialatedConvi,j(. is) a generator GiThe hole convolution of the jth convolution of (1);
the self-attention block is formed as:
wherein Att (-) is a self-attention block; is length ofAll one-dimensional vectors of (a);for the generator G with the jth convolution kernel at time tiThe self-attentive query output of (1);for the generator G with the jth convolution kernel at time tiThe self-attentive key output of (1);for the generator G with the jth convolution kernel at time tiThe value of self-attention.
3. The method according to claim 2, wherein the step S2 is to weight several generators of several scales for improving generalization performance of the generators on the data and proposing GAN with multi-scale generators.
4. The method of claim 3, wherein the weighting of the plurality of generators of the plurality of scales comprises:
A1. building q generators, each GiComposed of a DCT framework and a set of linear projections, from a detection window DiObtaining information, wherein i is more than or equal to 1 and less than or equal to q;
A2. integrating generators together according to importance and different weights to generate pseudo data;
A3. the integrated dummy data is compared with the real data using a single discriminator D.
5. The method according to claim 4, wherein the step S3 is specifically implemented by using a generation countermeasure network based on a hole convolution and transform module as a training model, and includes the following steps:
B1. extracting original data and preprocessing the original data;
B2. updating the weight of the generator;
B3. training the generator and the discriminator, and updating the generator and the discriminator.
6. The method for detecting data anomaly based on void convolution and deformation model according to claim 5, wherein the step B1 is to define the time sequence X as a sequence including lXA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator GiUsing a detection window DiE.g., D, the raw data is cut into subsequences and then sent to the model for training.
7. The method of claim 6, wherein step B2 is implemented by assigning a dynamic weight to each generator using an attention mechanism; the method of calculating the weights is as follows:
Wherein the content of the first and second substances,is the loss of true samples in iteration B-1;
8. The data anomaly detection method according to claim 7, wherein the step B3 is specifically that the discriminator D and the generator G are alternately updated when the generation confrontation model is trained; d is the dimension of the potential representation in the Transformer, and in iteration B, discriminator D is updated based on the following equation:
wherein,θdIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. h) is a generator before update;
a training generator, updating generator G according to the following equation:
wherein, thetadIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
9. The method according to claim 8, wherein the loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function includes the loss and gradient penalty of GAN itself; the loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated data and authentic data; the gradient penalty is used to implement the Lipschitz constraint; in the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:
wherein the content of the first and second substances,m is the maximum iteration number during DCT-GAN training; q is the number of generators;for generator G in iteration BiThe weight of (c); x is the number of(B)True data used in the B-th iteration; z is a radical of(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; (ii) a D (-) is the discriminator before updating; gi(. cndot.) is the generator before updating.
10. A power grid data anomaly detection method based on the data anomaly detection method of any one of claims 1 to 9, comprising the steps of:
C1. extracting power grid data;
C2. generating a final model from the power grid data through steps S1-S4;
C3. and carrying out anomaly detection on the power grid data by using the generated final model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110459689.4A CN113157771A (en) | 2021-04-27 | 2021-04-27 | Data anomaly detection method and power grid data anomaly detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110459689.4A CN113157771A (en) | 2021-04-27 | 2021-04-27 | Data anomaly detection method and power grid data anomaly detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113157771A true CN113157771A (en) | 2021-07-23 |
Family
ID=76871386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110459689.4A Pending CN113157771A (en) | 2021-04-27 | 2021-04-27 | Data anomaly detection method and power grid data anomaly detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113157771A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744265A (en) * | 2021-11-02 | 2021-12-03 | 成都东方天呈智能科技有限公司 | Anomaly detection system, method and storage medium based on generation countermeasure network |
CN113869208A (en) * | 2021-09-28 | 2021-12-31 | 江南大学 | Rolling bearing fault diagnosis method based on SA-ACWGAN-GP |
CN114423035A (en) * | 2022-01-12 | 2022-04-29 | 重庆邮电大学 | Service function chain abnormity detection method under network slice scene |
CN114611233A (en) * | 2022-03-08 | 2022-06-10 | 湖南第一师范学院 | Rotating machinery fault unbalance data generation method and computer equipment |
CN115018021A (en) * | 2022-08-08 | 2022-09-06 | 广东电网有限责任公司肇庆供电局 | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism |
CN115208645A (en) * | 2022-07-01 | 2022-10-18 | 西安电子科技大学 | Intrusion detection data reconstruction method based on improved GAN |
CN115392595A (en) * | 2022-10-31 | 2022-11-25 | 北京科技大学 | Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer |
CN115426282A (en) * | 2022-07-29 | 2022-12-02 | 苏州浪潮智能科技有限公司 | Voltage abnormality detection method, system, electronic device, and storage medium |
CN116383757A (en) * | 2023-03-09 | 2023-07-04 | 哈尔滨理工大学 | Bearing fault diagnosis method based on multi-scale feature fusion and migration learning |
-
2021
- 2021-04-27 CN CN202110459689.4A patent/CN113157771A/en active Pending
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113869208A (en) * | 2021-09-28 | 2021-12-31 | 江南大学 | Rolling bearing fault diagnosis method based on SA-ACWGAN-GP |
CN113869208B (en) * | 2021-09-28 | 2024-06-07 | 徐州卓越声振测控科技有限公司 | Rolling bearing fault diagnosis method based on SA-ACWGAN-GP |
CN113744265A (en) * | 2021-11-02 | 2021-12-03 | 成都东方天呈智能科技有限公司 | Anomaly detection system, method and storage medium based on generation countermeasure network |
CN114423035A (en) * | 2022-01-12 | 2022-04-29 | 重庆邮电大学 | Service function chain abnormity detection method under network slice scene |
CN114423035B (en) * | 2022-01-12 | 2023-09-19 | 北京宇卫科技有限公司 | Service function chain abnormality detection method in network slice scene |
CN114611233A (en) * | 2022-03-08 | 2022-06-10 | 湖南第一师范学院 | Rotating machinery fault unbalance data generation method and computer equipment |
CN114611233B (en) * | 2022-03-08 | 2022-11-11 | 湖南第一师范学院 | Rotating machinery fault imbalance data generation method and computer equipment |
CN115208645A (en) * | 2022-07-01 | 2022-10-18 | 西安电子科技大学 | Intrusion detection data reconstruction method based on improved GAN |
CN115208645B (en) * | 2022-07-01 | 2023-10-03 | 西安电子科技大学 | Intrusion detection data reconstruction method based on improved GAN |
CN115426282B (en) * | 2022-07-29 | 2023-08-18 | 苏州浪潮智能科技有限公司 | Voltage abnormality detection method, system, electronic device and storage medium |
CN115426282A (en) * | 2022-07-29 | 2022-12-02 | 苏州浪潮智能科技有限公司 | Voltage abnormality detection method, system, electronic device, and storage medium |
CN115018021B (en) * | 2022-08-08 | 2023-01-20 | 广东电网有限责任公司肇庆供电局 | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism |
CN115018021A (en) * | 2022-08-08 | 2022-09-06 | 广东电网有限责任公司肇庆供电局 | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism |
CN115392595B (en) * | 2022-10-31 | 2022-12-27 | 北京科技大学 | Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer |
CN115392595A (en) * | 2022-10-31 | 2022-11-25 | 北京科技大学 | Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer |
CN116383757A (en) * | 2023-03-09 | 2023-07-04 | 哈尔滨理工大学 | Bearing fault diagnosis method based on multi-scale feature fusion and migration learning |
CN116383757B (en) * | 2023-03-09 | 2023-09-05 | 哈尔滨理工大学 | Bearing fault diagnosis method based on multi-scale feature fusion and migration learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113157771A (en) | Data anomaly detection method and power grid data anomaly detection method | |
Ni et al. | Deep learning for data anomaly detection and data compression of a long‐span suspension bridge | |
CN101614786A (en) | Power electronic circuit on-line intelligence method for diagnosing faults based on FRFT and IFSVC | |
CN111242377A (en) | Short-term wind speed prediction method integrating deep learning and data denoising | |
Liu et al. | Combined forecasting method of dissolved gases concentration and its application in condition-based maintenance | |
CN112414715B (en) | Bearing fault diagnosis method based on mixed feature and improved gray level symbiosis algorithm | |
CN113283155A (en) | Near-surface air temperature estimation method, system, storage medium and equipment | |
Xu et al. | A deep learning approach to predict sea surface temperature based on multiple modes | |
CN116465623A (en) | Gearbox service life prediction method based on sparse converter | |
Ge et al. | A deep condition feature learning approach for rotating machinery based on MMSDE and optimized SAEs | |
Wang et al. | Fully Bayesian analysis of the relevance vector machine classification for imbalanced data problem | |
Liu et al. | Spatio-temporal generative adversarial network based power distribution network state estimation with multiple time-scale measurements | |
Cheng et al. | Adversarial attacks on deep neural network-based power system event classification models | |
CN114021425A (en) | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium | |
Zhuang et al. | An evaluation of big data analytics in feature selection for long-lead extreme floods forecasting | |
CN109061544B (en) | Electric energy metering error estimation method | |
CN116737110A (en) | Quantum random number performance evaluation method based on transducer model | |
CN116738338A (en) | Small sample fault diagnosis method based on multi-scale integrated LightGBM | |
Zhang et al. | A nonstationary and non‐Gaussian moving average model for solar irradiance | |
CN112637104A (en) | Abnormal flow detection method and system | |
CN113466681B (en) | Breaker service life prediction method based on small sample learning | |
CN113537573B (en) | Wind power operation trend prediction method based on double space-time feature extraction | |
Zhong et al. | FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting | |
CN114298413A (en) | Hydroelectric generating set runout trend prediction method | |
Yu et al. | A novel motor fault diagnosis method based on principal component analysis (PCA) with a discrete belief rule base (DBRB) system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |