CN114330486A

CN114330486A - Power system bad data identification method based on improved Wasserstein GAN

Info

Publication number: CN114330486A
Application number: CN202111366030.0A
Authority: CN
Inventors: 臧海祥; 郭镜玮; 赵佳伟; 黄蔓云; 卫志农; 陈�胜; 孙国强; 周亦洲; 韩海腾; 朱瑛
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-04-12

Abstract

The invention discloses an improved Wasserstein GAN-based bad data identification method for a power system, which comprises the following steps: screening and preprocessing historical measurement data only containing Gaussian white noise in a historical database; training a WGAN-GP model by using the preprocessed historical measurement data as target data; respectively establishing loss functions of a generator and a discriminator, carrying out game training, and obtaining a WGAN-GP model after the training is finished; carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the data into a trained WGAN-GP model to obtain measurement reconstruction data of the current section; and obtaining a reconstruction error of the current section based on the measurement reconstruction data and the real-time measurement data of the current section, and inputting the reconstruction error into the trained C4.5 decision tree model to identify bad data of the measurement data of the current section. The method and the device can acquire the measurement reconstruction error of the current section based on the real-time measurement information reconstruction, can quickly and accurately identify bad data, and can ensure the identification performance and simultaneously consider the identification efficiency.

Description

Power system bad data identification method based on improved Wasserstein GAN

Technical Field

The invention relates to a bad data identification method for an electric power system based on an improved Wasserstein GAN, and belongs to the technical field of electric power systems.

Background

With the proposal of the target of 30-60 double carbon, the large amount of new energy is connected to the grid, so that the data amount required to be processed by the power system is exponentially increased, and meanwhile, the data structure of the power system is more and more complex, thereby putting higher requirements on the reliability, safety and stability of the system operation. The state perception is one of core functions of an energy management system of the power system, has important significance on planning and operation of the power system, and provides a reliable database for real-time scheduling and subsequent high-level application and analysis of a series of power systems. In the actual operation process of the power grid, except for normal data noise, the measurement information acquired by each information acquisition unit inevitably generates bad data, and the existence of the bad data not only makes the state estimation of the power system difficult to reflect the real state of the system, but also causes troubles to the scheduling of the power system. Therefore, bad data identification has important significance for state estimation and power grid state analysis.

At present, for the problem of large measurement deviation, the bad data identification method can be divided into a traditional power system state estimation method and a bad data identification method based on data driving. The traditional power system state estimation method is characterized in that a physical model is used for iteration of estimation, detection, identification and re-estimation, although a relatively accurate detection result can be obtained, the method is low in operation speed and prone to erroneous judgment and missing judgment along with system increase. Therefore, in order to improve the identification efficiency and the identification precision at the same time, many scholars propose a bad data identification method based on data driving, and the most common method is a clustering method and a deep learning method. The improved FCM clustering method is provided by scholars, clustering analysis is summarized into a nonlinear optimization problem with constraint conditions, and compared with other clustering algorithms, the FCM algorithm has the advantages of simple design, wide application range and the like, but is sensitive to an initial value of the algorithm and easy to fall into local optimization. In the aspect of deep learning, a learner provides a bad data identification and algorithm for an electric power system based on a deep learning network, the algorithm takes collected electric power data as input parameters, a multi-layer deep circulation learning network is constructed, and bad data is identified. The algorithm can identify the condition that single bad data and parameter errors exist simultaneously, but the identification effect is poor under the condition of a plurality of bad data. Therefore, in order to improve the identification performance and the identification efficiency of the bad data at the same time, the invention provides the method for identifying the bad data of the power system based on the improved Wasserstein GAN, so as to quickly and accurately identify the bad data.

Disclosure of Invention

The invention provides a method for identifying bad data of an electric power system based on an improved Wassertein GAN, aiming at the problems of low identification accuracy and poor identification efficiency of the bad data of the large-scale electric power system.

The invention specifically adopts the following technical scheme to solve the technical problems:

the invention provides an improved Wasserstein GAN-based bad data identification method for a power system, which comprises the following steps:

step (1): screening historical measurement data only containing Gaussian white noise in a historical database, and performing data preprocessing on the screened historical measurement data to obtain preprocessed historical measurement data;

step (2): training a WGAN-GP model by using the preprocessed historical measurement data as target data; in the training process of the WGAN-GP model, a generator inputs Gaussian noise which has the same dimensionality as target data and meets standard normal distribution, and the generator converts the Gaussian noise into pseudo data which is similar to the target data distribution; the discriminator is responsible for distinguishing true and false data from target data and generated pseudo data, and the pseudo data obtained by distinguishing are used as reconstruction data output by the whole model;

and (3): respectively establishing loss functions of a generator and a discriminator in the WGAN-GP model, carrying out game training on the generator and the discriminator, and obtaining the WGAN-GP model which is suitable for the distribution of measurement data only containing Gaussian noise after the training is finished;

and (4): carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the preprocessed real-time measurement data of the current section into the WGAN-GP model trained in the step (3) to obtain measurement reconstruction data of the current section;

and (5): and (4) obtaining a reconstruction error of the current section based on the measurement reconstruction data of the current section obtained in the step (4) and the real-time measurement data of the current section, inputting the obtained reconstruction error of the current section into a trained C4.5 decision tree model, determining a threshold of bad data based on a bad data threshold setting method of the C4.5 decision tree model, and identifying the bad data of the real-time measurement data of the current section by combining the threshold of the bad data and the reconstruction error of the current section.

Further, as a preferred technical solution of the present invention, the data preprocessing is performed on the screened historical measurement data in the step (1), and specifically, the data preprocessing is performed by:

for the ith measured value of the kth section, carrying out data normalization on the measured value, wherein the calculation formula is as follows:

wherein x is_i ^*Is the ith normalized measurement value, x_iIs the ith original measurement.

Further, as a preferred technical solution of the present invention, in the WGAN-GP model in step (3), a Wasserstein optimization objective function is adopted, and a penalty term is introduced, where Wasserstein distance is expressed as:

wherein, W (P)_r,P_g) Represents the desired infimum bound of γ (x, y); II (P)_r,P_g) Is represented by P_g(. o) and P_r() a set of joint probability distributions γ for the edge distributions; w (P) is determined by Kantorovich-Rubinstein severity law_r,P_g) Expressed as:

wherein | D (I) | is less than or equal to 1;

in order to enable the model to be always kept as a Lipschitz continuous function, a penalty term is introduced on the basis of the original loss function, and the final loss function of the WGAN-GP model is obtained and expressed as follows:

wherein | · | purple sweet_pRepresents a p-norm; λ represents a penalty term coefficient;

wherein ε ∈ U [0,1]]U is uniformly distributed;

is composed of

The distribution of (a);

and, determining the loss function of the generator in the WGAN-GP model is expressed as: l is_G＝1-D(G(z))；

And, the loss function for determining the discriminators in the WGAN-GP model is expressed as:

wherein, P_r() represents a distribution of the target data; x represents a target data set; e (-) is the expectation function; g (-) represents a generator function; d (-) represents a discriminator function; p_g(-) represents a noisy data distribution; z represents the input noisy data vector;

and, during the WGAN-GP model training process, selecting the optimizer Adam to iterate L (G, D) and L_GTo optimize the parameters of the arbiter and generator, respectively.

Further, as a preferred technical solution of the present invention, the reconstruction error of the current cross section obtained in the step (5) is calculated by using the following formula:

wherein the content of the first and second substances,

reconstructing data for the t measurement in the k section, x_tFor the t-th real-time measured data, eta, in the k-th cross section_tThe reconstruction error of the section is obtained.

Further, as a preferred technical solution of the present invention, the training process of the C4.5 decision tree model in the step (5) specifically includes:

firstly, a calculation formula for determining the sample information entropy in the C4.5 decision tree model is as follows:

where D represents the test set of the C4.5 decision tree model, p_iRepresenting the proportion of certain data in the sample;

second, the information gain and gain ratio of the error signature are calculated, assuming signature e_MAs a division feature, there are v values arranged from small to large

Each interval

) Middle point of

As candidate partition points, there are v-1 such partition points, which are collectively denoted as:

wherein Q represents all candidate partition point sets; q. q.s_iRepresents the ith candidate partition point; the information gain of each candidate division point is calculated as:

wherein, | D | represents the number of training samples:

to represent

The number is the proportion;

to represent

The number is the proportion; selecting the largest information gain in all candidate division points as the information gain of the error characteristics, wherein the gain rate needs to be calculated as the final division standard due to the characteristics that the information gain is heavier and the number of the values can be more, and the gain rate calculation formula of the error characteristics is as follows:

R(D,e_M,q_i)＝Z(D,e_M,q_i)/I(e_M)

wherein R (-) is the gain ratio of the error feature; i (e)_M) Is characterized by_MBeta is a sign parameter indicating whether it is positive or negative.

Further, as a preferred technical solution of the present invention, in the step (5), the identifying of the bad data is performed on the real-time measurement data of the current cross section, specifically:

firstly, a threshold value meeting the requirement of stopping splitting is given, when the maximum information gain of the calculated error characteristics is smaller than the threshold value, the characteristics with better classification capability are not found, and at the moment, the real-time measurement data of all current sections are judged to be normal data; when the calculated maximum information gain of the error characteristics is larger than the threshold value, dividing; the dividing process comprises the following steps: suppose that

As a division point, the information gain rate is maximized if

Dividing the real-time measurement data of the current section into bad data, dividing the rest real-time measurement data of the current section into one type, and then adopting another intermediate node

As division standard to residueDividing the rest real-time measurement data of the current section if the data are divided

And judging the data to be normal data, otherwise, judging the data to be bad data, and repeating recursion in sequence until the divided data belong to the same category, so that the bad data identification can be realized.

By adopting the technical scheme, the invention can produce the following technical effects:

compared with the prior art, the method for identifying the bad data of the power system based on the improved Wasserstein GAN has the following advantages:

(1) the method carries out data reconstruction on the real-time measured data of the current section based on the improved Wasserstein GAN model, carries out data reconstruction on the real-time measured data of the current section through game training of a generator and a discriminator, and the reconstructed data is as consistent as possible with the target data distribution, so that the outlier data in a group of measured data can be conveniently found. Compared with the existing algorithm, the WGAN-GP model used by the invention adds a penalty term, thereby avoiding the problems that the original WGAN model is easy to generate gradient explosion and is difficult to converge.

(2) The method is mainly used for solving the problems of poor identification performance and low identification efficiency of bad data of the large power grid, based on real-time measured data reconstruction of the current section, and positioning the bad data by using a decision tree model. In addition, in order to avoid subjectivity of setting the bad data threshold and prevent missing judgment and erroneous judgment, a bad data threshold determination method based on a decision tree model is provided. Finally, the feasibility and accuracy of the method are verified through a large amount of simulation and actual measurement data, and the method has important significance for improving the quality of the measured data.

Therefore, the method can reconstruct the real-time measurement information of a certain section by using the improved Wasserstein GAN, thereby obtaining the measurement reconstruction error of the current section, accurately detecting the position of bad data in a group of real-time measurement information, further quickly and accurately identifying the bad data, ensuring the identification performance and simultaneously considering the identification efficiency.

Drawings

Fig. 1 is a schematic flow chart of a method for identifying bad data of an electric power system based on an improved Wasserstein GAN according to the present invention.

Fig. 2 illustrates the location of bad data in an IEEE118 node system according to the present invention.

FIG. 3 is a schematic diagram of an improved Wasserstein GAN model in the present invention.

FIG. 4 is a schematic diagram of the C4.5 decision tree model according to the present invention.

FIG. 5 is a graph of the test effect of the WGAN-GP model reconstruction data when single bad data is contained in the invention.

FIG. 6 is a graph showing the effect of the WGAN-GP model on data reconstruction when a plurality of bad data are contained in the test result.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

As shown in fig. 1, the present invention provides a method for identifying bad data of an electric power system based on an improved Wasserstein GAN, wherein the bad data location is shown in fig. 2. The method comprises the following steps:

step (1): and screening historical measurement data only containing white Gaussian noise in a historical database, and performing data preprocessing on the screened historical measurement data to obtain preprocessed historical measurement data, so that target data required by model training is obtained, preliminary analysis processing on the measurement data is realized, and a training set of the WGAN-GP model is constructed.

The data preprocessing of the screened historical measurement data is as follows:

for the ith measurement value of the kth section, carrying out data normalization on the measurement value, wherein the specific formula is as follows:

By utilizing the formula, the measurement values can be distributed between [0 and 1], so that the influence of the data magnitude on the model training is avoided. The SCADA measurement configuration generally comprises a node voltage amplitude and branch head and tail end power, so that the amplitude and the power are respectively modeled, and the model precision and the model training efficiency can be improved.

Step (2): training a WGAN-GP model by using the historical measurement data preprocessed in the step (1) as target data; in the training process of the WGAN-GP model, a generator inputs Gaussian noise which has the same dimensionality as target data and meets standard normal distribution, and the generator converts the Gaussian noise into pseudo data which is similar to the target data distribution; and the discriminator is responsible for distinguishing true data from false data generated from the target data and taking the distinguished false data as reconstructed data output by the whole model.

The schematic structural diagram of the WGAN-GP model is shown in fig. 3, and includes a generator and a discriminator. The generator network structure comprises a full-connection layer and three convolution layers, wherein the first three layers adopt LeakyReLU activation functions, and the last layer is a Tanh activation function; the discriminator model network structure includes three convolutional layers and a fully connected layer, the first three layers using the LeakyReLU activation function.

And (3): respectively establishing loss functions of a generator and a discriminator in the WGAN-GP model, carrying out game training on the generator and the discriminator to enable the generator and the discriminator to reach Nash equilibrium, obtaining the WGAN-GP model which is suitable for only containing Gaussian noise measurement data distribution after the training is finished, and fully learning essential characteristics of target data by the discriminator after the training is finished at the moment, wherein the method comprises the following specific steps:

firstly, the principle of the Wasserstein generative countermeasure network provided by the invention and containing the penalty term is as follows:

for the generation-oriented countermeasure network (GAN) in the prior art, it is an unsupervised deep learning model. The model framework mainly comprises two modules: a generator (genetic Model) and a discriminator (Discriminative Model), wherein input data of the generator is gaussian noise, and a loss function of the generator can be expressed as:

wherein E (-) is an expectation function; g (-) represents a generator function; d (-) represents a discriminator function; p_g(-) represents a noisy data distribution; z represents the input noisy data vector.

The input data of the discriminator is the pseudo data generated by the generator and the target data, the discriminator is mainly responsible for distinguishing the true and false of the target data and the pseudo data, and the loss function can be expressed as:

wherein, P_r() represents a distribution of the target data; x represents a target data set. The objective function of the game with the generator and the discriminator can be established by the two loss functions, and can be expressed as:

on the basis, the generator and the discriminator carry out alternating iterative game training, the capacities of the generator and the discriminator are stronger and stronger in the training process, and the generator and the discriminator can reach Nash balance theoretically.

The WGAN-GP model adopts Wasserstein to optimize the objective function and introduces a penalty term, so that the phenomena of gradient loss and mode collapse of the original GAN are effectively solved, the stability of the model is obviously improved, and the problem of non-convergence of the model caused by weight pruning of the WGAN is solved by introducing the penalty factor. Its Wasserstein distance can be expressed as:

wherein, W (P)_r,P_g) Represents the desired infimum bound of γ (x, y); II (P)_r,P_g) Is represented by P_g(. o) and P_r(. is) a set of joint probability distributions γ for the edge distributions. W (P) can be expressed by Kantorovich-Rubinstein duality law_r,P_g) Expressed as:

wherein, | d (i) | ≦ 1 constraint is to prevent the output of the discriminator from changing significantly when the data slightly changes. In order to enable the discriminator model to be always kept as a Lipschitz continuous function, a penalty term is introduced on the basis of the original loss function, so that the loss function during final WGAN-GP model training can be expressed as:

wherein ε ∈ U [0,1]]U is uniformly distributed;

is composed of

Distribution of (2).

Also, it can be determined that the loss function of the generator in the WGAN-GP model of the present invention is L_G＝1-D(G(z))；

Compared with the WGAN model in the prior art, the loss function of the discriminator in the WGAN-GP model of the invention has no change, and the loss function is as follows:

finally, the optimizer Adam is selected during the WGAN-GP model training process of the invention by iterating L (G, D) and L_GTo optimize the parameters of the arbiter and generator, respectively.

And (4): and (4) carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the preprocessed real-time measurement data of the current section into the WGAN-GP model trained in the step (3) to obtain measurement reconstruction data of the current section.

And (5): obtaining a reconstruction error of the current section based on the measurement reconstruction data of the current section obtained in the step (4) and the real-time measurement data of the current section, inputting the obtained reconstruction error of the current section into a trained C4.5 decision tree model, determining a threshold of bad data based on a bad data threshold setting method of the C4.5 decision tree model, and then identifying the bad data of the real-time measurement data of the current section by combining the threshold of the bad data and the reconstruction error of the current section, wherein the method specifically comprises the following steps:

firstly, according to the measurement reconstruction data of the current section and the real-time measurement data of the current section, calculating to obtain the reconstruction error of the current section, and adopting a calculation formula as follows:

wherein the content of the first and second substances,

Then, for the C4.5 decision tree model, which is an extension and optimization of the ID3 algorithm, the calculation formula of the sample information entropy is as follows:

where D represents the test set of the C4.5 decision tree model, p_iThe proportion of certain data in the sample is shown, wherein the proportion of normal data in the sample is shown by i-1, and the proportion of poor data in the sample is shown by i-2.

Each interval

Middle point of

wherein Q represents all candidate partition point sets; q. q.s_iRepresenting the ith candidate partition point. The information gain of each candidate partition point is:

wherein, | D | represents the number of training samples:

to represent

The number is the proportion;

to represent

The number is the proportion. At this time, the information gain with the largest information gain rate in all candidate quantiles is selected as the information gain of the error feature, and because the information gain is heavier and has more dereferencing features, the gain rate needs to be calculated as the final division standard, and the calculation formula is as follows:

R(D,e_M,q_i)＝Z(D,e_M,q_i)/I(e_M)

wherein R (-) is the gain ratio of the error feature; i (e)_M) Is characterized by_MThe greater the number of possible values, I (e)_M) The larger the value is; β is a sign parameter indicating whether it is positive or negative.

Finally, determining the threshold of the bad data by the bad data threshold setting method based on the C4.5 decision tree model, and identifying the bad data of the real-time measured data of the current section by combining the threshold of the bad data and the reconstruction error of the current section, as shown in FIG. 4, taking the input reconstruction error D of the current section as a root node and an intermediate node

Expressing the corresponding error feature division standard in the training set, expressing the division result as normal data or bad data by leaf nodes, and specifically adopting the following identification process:

firstly, a threshold value meeting the requirement of stopping splitting is given, when the maximum information gain of the calculated error characteristics is smaller than the threshold value, the characteristics with better classification capability are not found, and at the moment, the real-time measurement data of all current sections are judged to be normal data; if the calculated maximum information gain of the error characteristics is larger than the threshold value, dividing, otherwise, ending the division; the dividing process comprises the following steps: suppose that

The maximum information gain rate as a division point, if the error characteristic

Dividing the real-time measurement data of the remaining current section as a division standard, if so, dividing

And further judging the data to be normal data, otherwise judging the data to be bad data, and repeating recursion in sequence until the divided data belong to the same category, thereby realizing the identification of the bad data. Therefore, the decision tree is equivalent to a two-classification model, and the classified data types comprise normal data and bad data, so that the purpose of identifying the bad data is achieved.

Therefore, based on the above steps, the method of the present invention reconstructs the measurement information of a certain cross section by using the improved Wasserstein GAN, so as to accurately detect the position of the bad data in a group of measurement information, and can simultaneously improve the identification performance and the identification efficiency of the bad data.

In order to verify the superiority of the method of the present invention, simulation test and detailed description are performed on the IEEE118 node system. The measurement configuration of the example is active and reactive power measurement of the head and tail ends of the branch and node voltage amplitude measurement according to the measurement configuration of the actual power system state estimation.

In the simulation, the load curve of the actual power system is utilized to carry out simulation to obtain multi-section tidal current data as a true value, Gaussian white noise is added on the basis of tidal current to simulate normal measurement data, the bad data condition occurring in the operation process of the actual power system is simulated by increasing or reducing the normal power measurement data by 50-150 percent and increasing or reducing the normal voltage amplitude measurement by 15-25 percent,and data is processed according to the following 6: 4, the invention is realized by Python programming, a tested hardware platform is based on a PC, and a processor is an Intel @ Core^TMi7-8700K CPU @3.70GHz and the memory is 16 GB. Therefore, the identification performance of the method for identifying the bad data of the power system with the improved Wasserstein GAN provided by the invention is tested.

TABLE 1IEEE118 node System bad data configuration

The method is characterized in that an example simulation test is carried out on the basis of an IEEE118 node system, wherein an offline training sample is a historical measurement data set formed by 3000 section data, and the WGAN-GP model is trained offline. 10% of the samples in the test set were selected to add bad data at random locations. The bad data configuration of the real-time metrology data is shown in table 1.

The bad data configuration No. 1-9 is related bad data conditions caused by measurement residual pollution appearing in an actual power grid, and the bad data configuration No. 10-14 is single bad data conditions collected in the actual power system operation process. The various bad data are added into the test set for model verification, and the WGAN-GP model is utilized to reconstruct the data of the real-time measurement data as shown in Table 2.

TABLE 2WGAN-GP model test Effect

As can be seen from table 2, the WGAN-GP model trained for the real data set based on normal measurement can reconstruct data of the real-time measurement data, the reconstructed data is closer to the real value of the system load flow than the original bad data, and the difference value is between [ -0.01, 0.01], and belongs to the normal measurement data. And inputting the real-time measurement data into the trained WGAN-GP model to obtain measurement reconstruction data of the current section, and further obtain a reconstruction error of the current section. The calculation formula of the reconstruction error of the current section is as follows:

wherein the content of the first and second substances,

And then, inputting the reconstruction error of the current section into the trained C4.5 decision tree model, and identifying bad data of the group of measured data. The invention is tested based on the measurement data of an IEEE118 node system, wherein the measurement configuration comprises a node voltage amplitude value and active power and reactive power of the head end and the tail end of a branch circuit. The identification results are shown in tables 3 and 4. The calculation formulas of the missed detection rate and the false detection rate are as follows:

wherein x is_badThe number of the undetected bad data is counted; x is the number of_mThe number of the measured data which are originally normal data but are falsely detected as bad data; x is the number of_tatalThe total number of the measurements; eta₁The omission factor of the section is shown; eta₂The false detection rate of the cross section is shown.

TABLE 3 identification of various methods under various bad data

TABLE 4 false negative and false positive rates for different methods

As can be seen from table 3, when the number of bad data increases, the traditional bad data identification method returns the undetected rate and the false detection rate of different degrees, and although the FCM clustering method and the SVM neural network algorithm are improved over the traditional method, the more serious undetected rate still occurs when the number of bad data increases. However, the method provided by the invention does not have a missing detection rate when the bad data is gradually increased, only a certain proportion of false detections occur, but the method provided by the invention is greatly improved in the aspect of identification effect.

The invention discloses a data-driven bad data identification method of a large-scale power system, which is constructed by the invention and aims to solve the problem of overlong bad data identification and calculation time. Table 4 shows the calculated time for various test systems for different bad data identification methods. In order to visually show the calculation efficiency of the invention, the number of the bad data in different test systems is set to be 10, wherein the voltage amplitude is 2, and the branch power is 8. As can be seen from table 5, as the system scale increases, the calculation time of the conventional bad data identification method increases significantly, and especially when the system increases to 13659 nodes, the conventional bad data identification method cannot meet the requirement of online identification; compared with the FCM clustering method and the SVM neural network-based method, the method disclosed by the invention has the advantage that the calculation efficiency is improved.

TABLE 5 calculation times under different algorithms

The main innovation point and key point of the invention are the performance of the WGAN-GP model data reconstruction, and in order to better reflect the performance of the model, fig. 5 and 6 are measurement data graphs of single bad data reconstruction and multiple bad data reconstruction respectively. The system tested in the figure is an IEEE118 node system, where the measurement configuration includes node voltage magnitude, branch head and tail end active and reactive power.

Fig. 5 is a diagram showing a result of real-time measurement, true trend value and reconstructed data of a section of the IEEE118 node system. Wherein the shaded portion is the normal measurement range. As can be seen from fig. 5, when there is a single bad data in the measured data, the reconstructed data generated by the WGAN-GP model proposed by the present invention is within the normal measurement range, and the bad data can be treated as "outlier data", which is more convenient for the decision tree to identify the bad data.

As shown in fig. 6, a graph of the testing effect of the WGAN-GP when the real-time measurement data includes a plurality of bad data is shown, wherein the shaded area is the normal measurement range. As can be seen from fig. 6, when there are a plurality of bad data due to residual contamination, the reconstructed data generated by the WGAN-GP model proposed by the present invention is still within the normal measurement range, and the bad data position can be located by inputting the reconstruction error into the decision tree model.

In conclusion, the method can be used for solving the problems of low identification efficiency of bad data and false detection and missed detection of a large-scale power system, generating pseudo data which is closest to real data distribution by utilizing the game idea of the WGAN-GP model, further identifying the outlier data in the generated data, reconstructing the data of the real-time measured data of a certain section to obtain the measured reconstruction data of the current section, calculating the reconstruction error to identify the bad data of the real-time measured data of the section, and can be used for quickly and accurately identifying the bad data, thereby ensuring the identification performance and simultaneously considering the identification efficiency.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. The method for identifying the bad data of the power system based on the improved Wasserstein GAN is characterized by comprising the following steps of:

2. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the step (1) of performing data preprocessing on the screened historical measurement data specifically comprises:

3. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein said WGAN-GP model in step (3) adopts a Wasserstein optimization objective function and introduces penalty term, and the Wasserstein distance is expressed as:

and introducing a penalty term to obtain a final loss function of the WGAN-GP model, wherein the final loss function is expressed as:

wherein | · | purple sweet_pRepresents a p-norm; x represents a target data set, and lambda represents a penalty term coefficient;

wherein ε ∈ U [0,1]]U is uniformly distributed;

is composed of

The distribution of (a);

wherein, P_r() represents a distribution of the target data; e (-) is the expectation function; g (-) represents a generator function; d (-) represents a discriminator function; p_g(-) represents a noisy data distribution; z represents the input noisy data vector;

4. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the reconstruction error of the current section obtained in the step (5) is calculated as follows:

wherein the content of the first and second substances,

5. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the training process of the C4.5 decision tree model in the step (5) comprises:

Each interval

Middle point of

wherein, | D | represents the number of training samples:

to represent

The number is the proportion;

to represent

R(D,e_M,q_i)＝Z(D,e_M,q_i)/I(e_M)

6. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 5, wherein the step (5) of identifying the bad data from the real-time measurement data of the current section specifically comprises:

As a division point, the information gain rate is maximized if