CN113157771A

CN113157771A - Data anomaly detection method and power grid data anomaly detection method

Info

Publication number: CN113157771A
Application number: CN202110459689.4A
Authority: CN
Inventors: 李一帆; 彭晓燕; 颜志威; 李智勇; 梁汉宇; 马炎南
Original assignee: Guangdong Hailiao Technology Co ltd
Current assignee: Guangdong Hailiao Technology Co ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-23

Abstract

The invention discloses a data anomaly detection method, which comprises the steps of generating an initial model; using a multi-scale generator; training an initial model; generating a loss function; and (6) carrying out abnormity detection. The method extracts data information of a time sequence through the sliding window, and uses the cavity convolution to improve the accuracy of the model and the generalization capability of the model. The invention also provides a power grid data anomaly detection method. The present invention utilizes multiple generators and a single discriminator to alleviate the pattern collapse problem. Each generator contains a convolutional neural network of different sizes to obtain fine-grained and coarse-grained information of the time series. The generator comprises a Transformer module which is used for processing time sequence data so as to improve the precision; the present invention uses an attention mechanism to balance these generators, being able to better accommodate the data currently in use. Therefore, the method can effectively solve the problems of low precision, poor generalization capability and the like in the flow data anomaly detection.

Description

Data anomaly detection method and power grid data anomaly detection method

Technical Field

The invention belongs to the field of big data processing, and particularly relates to a data anomaly detection method and a power grid data anomaly detection method.

Background

The data anomaly detection method can save manpower and material resources by carrying out anomaly detection on big data, so that the problem is more and more generally solved by adopting the data anomaly detection method. Data anomaly detection methods typically employ time series anomaly detection. A time series anomaly refers to an observation that is particularly different from other observations in a particular time series. The anomaly detection method plays an important role in the fields of extreme weather or climate detection, network intrusion detection, chemical engineering fault diagnosis, power grid fault diagnosis and the like. For example, in extreme weather, quantitative indexes such as wind direction, wind speed and precipitation have different degrees of abnormality, and the abnormality detection utilizes a model to predict the extreme weather which may occur; in chemical engineering faults, the reading indexes of all valves are possibly abnormal; in network intrusion, abnormal detection can timely discover abnormal access and control operation; in grid fault diagnosis, a power fault may be discovered from an abnormality of power-related data. If the abnormality cannot be found in time, economic loss and even casualties can be caused.

However, time-series tags are often too difficult to obtain or too expensive. Over the last several decades, many researchers have been working on detecting anomalies in time series. Some of the earlier methods attempted to build a mathematical model that fits perfectly into the given data and treated outliers as anomalies. These methods distinguish between normal and abnormal samples by measuring the distance between each sample or the density of each point. Therefore, in order to obtain good experimental results, it is necessary to find a model that can perfectly fit real data, but when the situation is complicated and the data is affected by various factors, it is difficult to describe the data in the real world using a single model.

Disclosure of Invention

One objective of the present invention is to provide a data anomaly detection method, which uses a hole convolution and a Transformer to generate counter-streaming data for anomaly detection of data, so as to improve the accuracy and the generalization of the anomaly detection method.

The invention also provides a power grid data anomaly detection method.

The data anomaly detection method provided by the invention comprises the following steps:

s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network;

s2, using a multi-scale generator; performing feature extraction on the data from a plurality of angles, so that the multi-scale generator has generalization capability of performing feature extraction on different scale information;

s3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights;

s4, generating a loss function, and adding a gradient punishment mechanism into the loss function to generate a final model;

and S5, carrying out anomaly detection by using the generated final model.

Step S1 is specific to the generator G_iLet us order

To point to generator G_iThe (2) th convolution of (a),

a convolution kernel after the hole; convolution kernel size of

The size of the hole convolution kernel is

With a stride of

A void rate of

Filling p_iThe relationship between them satisfies the formula:

if x refers to the sequence x with padding, then the 3 rd element of the result is scattered as:

wherein, DialateConv_i,j(. is) a generator G_iThe hole convolution of the jth convolution of (1); m represents the mth value of the hole convolution result,

is the void fraction;

is the hole convolution kernel size;

is the input of the hole convolution;

is the hole convolution kernel size.

Adding a Transformer-based network; the Transformer-based network core part is a self-attention technology, and self-attention maps a query Q and a group of key value pairs K and V to one output; for generator G with jth convolution kernel_iAnd at the same time,

where ω is the number of detection windows and d is the number of dimensions potentially represented in the transform; at a given time stamp t, the following equation holds:

wherein the f-function is a set of linear projections;

for the generator G with the jth convolution kernel at time t_iThe self-attentive query output of (1);

for the generator G with the jth convolution kernel at time t_iThe self-attentive key output of (1);

for the generator G with the jth convolution kernel at time t_iA value of self-attention of;

for detecting the window D_iThe sequence detected at time t; DialatedConv_i,j(. is) a generator G_iThe hole convolution of the jth convolution of (1);

the self-attention block is formed as:

wherein Att (-) is a self-attention block;

is length of

All one-dimensional vectors of (a);

for the generator G with the jth convolution kernel at time t_iThe value of self-attention.

Step S2 is to weight several generators of several scales for improving generalization performance of the generators on data, and propose GAN with multi-scale generators.

The weighting modes of the generators of the plurality of scales specifically include:

A1. building q generators, each G_iComposed of a DCT framework and a set of linear projections, from a detection window D_iObtaining information, wherein i is more than or equal to 1 and less than or equal to q;

A2. integrating generators together according to importance and different weights to generate pseudo data;

A3. the integrated dummy data is compared with the real data using a single discriminator D.

Step S3 is specifically to use a generated countermeasure network based on the hole convolution and the transform module as a training model, and includes the following steps:

B1. extracting original data and preprocessing the original data;

B2. updating the weight of the generator;

B3. training the generator and the discriminator, and updating the generator and the discriminator.

Step B1 is specifically defined as defining the time sequence X as one containing l_XA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator G_iUsing a detection window D_iE.g., D, the raw data is cut into subsequences and then sent to the model for training.

Step B2 is specifically to assign a dynamic weight to each generator by using an attention mechanism; the method of calculating the weights is as follows:

defining initial weights

Computing generator G in iteration B from real samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of true samples in iteration B-1;

computing generator G in iteration B from pseudo samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of dummy samples in iteration B-1.

Step B3 is specifically that when the confrontation model is generated by training, the discriminator D and the generator G are alternately updated; d is the dimension of the potential representation in the Transformer, and in iteration B, discriminator D is updated based on the following equation:

wherein, theta_dIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of^(B)True data used in the B-th iteration; z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. h) is a generator before update;

a training generator, updating generator G according to the following equation:

wherein, theta_dIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators;

for generator G in iteration B_iThe weight of (c); z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

The loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function includes the loss and gradient penalty of GAN itself; the loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated data and authentic data; the gradient penalty is used to implement the Lipschitz constraint; in the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:

wherein the content of the first and second substances,

m is the maximum iteration number during DCT-GAN training; q is the number of generators;

for generator G in iteration B_iThe weight of (c); x is the number of^(B)True data used in the B-th iteration; z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

The invention also discloses a power grid data anomaly detection method based on the data anomaly detection method, which comprises the following steps:

C1. extracting power grid data;

C2. generating a final model from the power grid data through steps S1-S4;

C3. and carrying out anomaly detection on the power grid data by using the generated final model.

According to the data anomaly detection method and the power grid data anomaly detection method, the data information of a time sequence is extracted by using the sliding window, and the cavity convolution is used, so that the accuracy of the model is improved, and the generalization capability of the model is improved. The present invention utilizes multiple generators and a single discriminator to alleviate the pattern collapse problem. Each generator contains a convolutional neural network of different sizes to obtain fine-grained and coarse-grained information of the time series. Meanwhile, the generator also comprises a Transformer module which is used for processing time sequence data so as to improve the precision; at the same time, an attention mechanism is also used to balance the producers so that they can better adapt to the data currently in use. Therefore, the method can effectively solve the problems of low precision, poor generalization capability and the like in the flow data anomaly detection.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Fig. 2 is a schematic diagram of data extraction based on a sliding window according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of feature extraction based on a sliding window according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of generating a countermeasure network model according to an embodiment of the invention.

Fig. 5 is a schematic diagram of an anomaly detection network model according to an embodiment of the present invention.

Detailed Description

FIG. 1 is a schematic flow chart of the method of the present invention: the data anomaly detection method provided by the invention comprises the following steps:

s1, generating an initial model; using a Transformer module based on hole convolution as a generator in the generation countermeasure network; the hole convolution refers to a convolution neural network with a convolution kernel comprising fixed intervals; the Transformer module refers to a Transformer generator module (comprising a multi-head attention module, a feedforward neural network and a residual connection) defined in an original Transformer model;

s2, using a multi-scale generator; performing feature extraction on data from a plurality of angles including coarse granularity, fine granularity and the like, so that the multi-scale generator has generalization capability of performing feature extraction on information of different scales;

s4, generating a loss function, and adding a gradient punishment mechanism into the loss function to ensure that mode collapse is not easy to occur; generating a final model; the gradient penalty mechanism refers to a 1-GP gradient penalty term used in WGAN;

and S5, carrying out anomaly detection by using the generated final model.

The data anomaly detection method is based on the time sequence data anomaly detection of GAN, and the Transformer is a technology for updating the matrix only by using the interrelation of the information in the matrix; FIG. 2 is a schematic diagram of data extraction based on sliding window according to an embodiment of the present invention; fig. 3 is a schematic diagram of feature extraction based on a sliding window according to an embodiment of the present invention.

The model generated in step S1 is a transform model based on the hole convolution, and constitutes the main structure of the generator and the discriminator in the generation countermeasure network; although transformers are powerful tools for processing text sequences, transformers still have difficulty obtaining information directly from the underlying space. Therefore, we propose a hole convolution transform (DCT) architecture to solve this problem. The DCT architecture comprises multi-scale feature extraction and a Transformer-based network; wherein, multi-scale hole CNN is adopted to extract multi-scale features. The hole CNN can be used in semantic segmentation, can expand a receiving domain without reducing resolution, and is used to acquire multi-scale information of a detection window. Since the current data is a single time series, one-dimensional hole CNN is selected to process the data. Meanwhile, in order to ensure that the generators acquire the same amount of information at the same time, the CNN network of multiple holes between each generator maintains a stable reception field, and this method encourages the generators to acquire the same amount of information at the same time. To match the Transformer's network, the detection windows are filled such that the detection window outputs are equal in size.

In particular for the generator G_iLet us order

To point to generator G_iThe (2) th convolution of (a),

the convolution kernel after the hole. Convolution kernel size of

The size of the hole convolution kernel is

With a stride of

A void rate of

Filling p_iThe relationship between them satisfies the formula:

and

formally, assuming x refers to the sequence x with padding, the 3 rd element of the result is scattered as:

is the void fraction;

is the hole convolution kernel size;

is the input of the hole convolution;

is the hole convolution kernel size.

Although the CNN of a hole can capture multi-scale features from a single inspection window, it is not good at processing sequence data. Thus, a transform-based network is added. The Transformer based core of the network is a self-attention technology. Self-attention maps a query Q and a set of key-value pairs K and V to one output. For generator G with jth convolution kernel_iAt the same time

wherein, DialateConv_i,j(. is) a generator G_iThe hole convolution of the jth convolution of (1); the f function is a set of linear projections;

for detecting the window D_iThe sequence detected at time t;

the self-attention block is formed as:

wherein Att (-) is a self-attention block,

is length of

All one-dimensional vectors of (a);

Step S2 is specifically to weight a plurality of generators of a plurality of scales for improving generalization performance of the generators on data. The DCT architecture has multi-scale feature extraction and time series processing capabilities, and can learn from the detection window, but there is a conflict: if the window is to be detected

Is set to a smaller value, the mode may become disabled due to limited information received while satisfying the long-term context anomaly; if the window is to be detected

Is set to a large value, the model may get too much useless information, resulting in the model suffering from low accuracy or low efficiency. Therefore, GAN with multi-scale generators is proposed.

The weighting modes of the generators of the scales specifically include:

A1. building q generators, each G_i(1 ≦ i ≦ q) consisting of a DCT framework and a set of linear projections, and from the detection window D_iTo obtain information;

Step S3 is specifically to use the generated confrontation network based on the hole convolution and the transform module as a training model, as shown in fig. 4, which is a schematic diagram of the generated confrontation network model according to the embodiment of the present invention. The method comprises the following steps:

B1. extracting original data and preprocessing the original data;

B2. updating the weight of the generator;

Step B2 is specifically that, since the generators have their own preferences for different kinds of anomalies, weighting the generators with a fixed weight cannot achieve a good effect, and therefore, an attention mechanism is employed to assign a dynamic weight to each generator, that is, a loss value during training is taken as a basis for the importance of a specific generator;

suppose that in one iteration B-1, a generator G is used_uAnd use generator G_vIn contrast, the discriminator loss (BCE loss) is greater, and the generator G is considered to be either the true sample or the generated data for the sample passed to the generator_vIs more applicable to the current data because in iteration B-1, generator G is_vMore accurate judgment is provided for the data. Therefore, generator G is added in the next iteration B_vThe weight of (c).

The method of calculating the weights is as follows: defining initial weights

Computing generator G in iteration B from real samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of true samples in iteration B-1;

computing generator G in iteration B from pseudo samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of dummy samples in iteration B-1.

Step B3 is embodied as alternately updating discriminator D and generator G as the initial GAN is trained to generate the confrontational model. In iteration B, discriminator D is updated based on the following equation:

a training generator, updating generator G according to the following equation:

The loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function mainly comprises two parts: loss of GAN itself and gradient penalty. The loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated and true data, while the gradient penalty is used to implement the Lipschitz constraint and make the model easier to converge. In the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:

wherein the content of the first and second substances,

m is the maximum iteration number during DCT-GAN training; q is the number of generators; (ii) a

For generator G in iteration B_iThe weight of (c);

true data used in the B-th iteration; z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

In a specific embodiment, the algorithm 1 for training the model is as follows:

inputting: x is a time sequence; d is a group of detection windows; b is the batch size when training DCT-GAN; m is the maximum iteration number; q is the number of generators;

and (3) outputting: g is a group of trained generators; d is a group of trained discriminators;

initializing generators G and discriminators D, each generator

Is (0);

for B＝1,B<M,B++do；

for i＝1,i<M,i++do；

from S_iSample b sequence of (1)

From

Of the sample noise z_i；

At each generator G_iPrevious iteration of the above-computed true data

And a previous iteration of dummy data

Training loss of (BCE loss);

respectively calculate

And

end for

calculating L_DGAnd by minimizing L_DGTo update the discriminator;

for i＝1,i<q,i++do 11；

update generator G_i；

end for

One implementation of step S5 is as follows:

and (1) detecting the structure of the model.

Because the number of abnormal samples is small, all data are directly used, and a GAN structure is used for training, so that a pseudo sample close to real normal data can be generated. Thus, the present invention uses one of the multiple generators to reconstruct the actual data; in a specific embodiment, the generator is selected according to the length of the sliding window required, in this embodiment, the generator G is selected_aAnd (6) carrying out abnormity detection. Due to the generator G_aDerived from a trained model, is a fixed structure, so that the proper hidden variable z is found in the hidden space by back-propagating the hidden variable z^*So as to generate a variable G_a(z) is more similar to the real sample, and fig. 5 is a schematic diagram of the anomaly detection network model according to the embodiment of the present invention.

And (2) loss functions and algorithms in the abnormal detection stage.

The loss function of anomaly detection is mainly composed of two parts: i.e. the loss between real data and dummy data, and the loss between features extracted from real data and features extracted from dummy data with a discriminator. The complete loss function can be expressed by the following formula:

wherein

f is the output of the discriminator intermediate layer.

In a specific embodiment, the anomaly detection is shown in the following algorithm 2:

inputting: t is the iteration number;

assigning parameters to the weights in the loss function; e is an abnormality detection sequence; s_wIs the detection window size; g_aGenerating G for a trained generator associated with a detection window_aA priori probability of compatible noise; eta is the abnormality rate;

and (3) outputting: a list of anomalies detected in E, A;

for i＝1,i<T,i++do

training potential space z using E

if i＝＝T then

Storing a trained latent space z_a＝z

end if

Then calculate L_TSADRank the sequence E according to the loss, select the largest l_E·ηAnd sampling to form an exception list A.

In a specific embodiment, the faults of the power grid caused by the power transmission and transformation equipment are as follows:

external force damage of the power transmission line comprises crane tower collision and line collision, short circuit caused by lightning stroke, short circuit caused by line loss of branches, kites and the like, pollution flashover caused by severe weather and the like, and line tripping or accidents, equipment damage and the like can be caused.

The power transformation equipment is damaged or trip-out protected due to overhigh temperature, reduced insulating performance, overweight load, over-standard electrical parameters, mechanical reasons and the like, and further power grid accidents can be caused.

In order to detect data abnormity of the power grid, the method comprises the following steps:

C1. extracting power grid data;

C2. generating a final model from the power grid data through steps S1-S4;

In the specific implementation:

s0. extracting the power grid data;

s2, using a multi-scale generator; performing feature extraction on the data from a plurality of angles, so that the multi-scale generator has generalization capability of performing feature extraction on different scale information; for the embodiment of the power grid, the method can extract related parameters such as local load, voltage and frequency in the power transmission and transformation processes of the power grid from multiple angles such as rationality of an operation mode and overload rate of a line, and form multi-scale measurement aiming at the fault occurrence conditions of the power transformation equipment every week, every month, every quarter and every year, so that a multi-scale generator is formed, and the corresponding multi-scale generator has generalization capability of extracting the characteristics of the related parameters such as the local load, the voltage and the frequency in the power transmission and transformation processes of the power grid.

S3, training an initial model; dynamically adjusting weights of a plurality of multi-scale generators in each iteration by using an attention mechanism so that the multi-scale generators have different weights; dynamically adjusting the weights of multi-scale generators formed by weekly, monthly, quarterly and yearly fault occurrence of the power transformation equipment by using an attention mechanism in each iteration so that the multi-scale generators have different weights;

and performing multiple experiments on the extracted data characteristics in the plurality of multi-scale generator pairs, thereby dynamically adjusting different weights of each multi-scale generator to reach an optimal state.

and S5, carrying out anomaly detection by using the generated final model.

Step S1 is specific to the generator G_iLet us order

To point to generator G_iThe (2) th convolution of (a),

a convolution kernel after the hole; convolution kernel size of

The size of the hole convolution kernel is

With a stride of

A void rate of

Filling p_iThe relationship between them satisfies the formula:

and

is the void fraction;

is the hole convolution kernel size;

is the input of the hole convolution;

is the hole convolution kernel size.

wherein the f-function is a set of linear projections;

the self-attention block is formed as:

wherein Att (-) is a self-attention block;

is length of

All one-dimensional vectors of (a);

B1. extracting original data and preprocessing the original data;

B2. updating the weight of the generator;

defining initial weights

Computing generator G in iteration B from real samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of true samples in iteration B-1;

computing generator G in iteration B from pseudo samples_iWeight of (2)

Wherein the content of the first and second substances,

to be in iterationLoss of dummy samples in B-1.

a training generator, updating generator G according to the following equation:

for generator G in iteration B_iThe weight of (c); z is a radical of^(B) Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

wherein the content of the first and second substances,

for generator G in iteration B_iThe weight of (c); x: (^B) True data used in the B-th iteration; z: (^B) Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

Claims

1. A data anomaly detection method is characterized by comprising the following steps:

and S5, carrying out anomaly detection by using the generated final model.

2. Method for detecting data anomalies according to claim 1, characterized in that step S1 is specific to a generator G_iLet us order

To point to generator G_iThe (2) th convolution of (a),

a convolution kernel after the hole; convolution kernel size of

The size of the hole convolution kernel is

With a stride of

Void rate of r_i ^jFilling in p_iThe relationship between them satisfies the formula:

and

wherein, DialateConv_i,j(. is) a generator G_iThe hole convolution of the jth convolution of (1); m represents the mth value of the result of the hole convolution, r_i ^jIs the void fraction;

is the hole convolution kernel size;

is the input of the hole convolution;

is the hole convolution kernel size.

wherein the f-function is a set of linear projections;

the self-attention block is formed as:

wherein Att (-) is a self-attention block;

is length of

All one-dimensional vectors of (a);

3. The method according to claim 2, wherein the step S2 is to weight several generators of several scales for improving generalization performance of the generators on the data and proposing GAN with multi-scale generators.

4. The method of claim 3, wherein the weighting of the plurality of generators of the plurality of scales comprises:

5. The method according to claim 4, wherein the step S3 is specifically implemented by using a generation countermeasure network based on a hole convolution and transform module as a training model, and includes the following steps:

B1. extracting original data and preprocessing the original data;

B2. updating the weight of the generator;

6. The method for detecting data anomaly based on void convolution and deformation model according to claim 5, wherein the step B1 is to define the time sequence X as a sequence including l_XA sequence of bar data, defining a set of detection windows D at the same time; for a particular generator G_iUsing a detection window D_iE.g., D, the raw data is cut into subsequences and then sent to the model for training.

7. The method of claim 6, wherein step B2 is implemented by assigning a dynamic weight to each generator using an attention mechanism; the method of calculating the weights is as follows:

defining initial weights

Computing generator G in iteration B from real samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of true samples in iteration B-1;

computing generator G in iteration B from pseudo samples_iWeight of (2)

Wherein the content of the first and second substances,

is the loss of dummy samples in iteration B-1.

8. The data anomaly detection method according to claim 7, wherein the step B3 is specifically that the discriminator D and the generator G are alternately updated when the generation confrontation model is trained; d is the dimension of the potential representation in the Transformer, and in iteration B, discriminator D is updated based on the following equation:

wherein，θ_dIs a discriminator parameter; m is the maximum iteration number during DCT-GAN training; q is the number of generators; x is the number of^(B)True data used in the B-th iteration; z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; d (-) is the discriminator before updating; g_i(. h) is a generator before update;

a training generator, updating generator G according to the following equation:

9. The method according to claim 8, wherein the loss function of step S4 is: in the generative confrontation model proposed by the present invention, the loss function includes the loss and gradient penalty of GAN itself; the loss of GAN itself represents the accuracy with which the discriminator distinguishes between generated data and authentic data; the gradient penalty is used to implement the Lipschitz constraint; in the B-th iteration of generating the countermeasure model, the entire loss function is defined as follows:

wherein the content of the first and second substances,

for generator G in iteration B_iThe weight of (c); x is the number of^(B)True data used in the B-th iteration; z is a radical of^(B)Pseudo data used in the B-th iteration; log (-) denotes taking the logarithm, used to compute the reduced number; (ii) a D (-) is the discriminator before updating; g_i(. cndot.) is the generator before updating.

10. A power grid data anomaly detection method based on the data anomaly detection method of any one of claims 1 to 9, comprising the steps of:

C1. extracting power grid data;

C2. generating a final model from the power grid data through steps S1-S4;