CN114330486A - Power system bad data identification method based on improved Wasserstein GAN - Google Patents

Power system bad data identification method based on improved Wasserstein GAN Download PDF

Info

Publication number
CN114330486A
CN114330486A CN202111366030.0A CN202111366030A CN114330486A CN 114330486 A CN114330486 A CN 114330486A CN 202111366030 A CN202111366030 A CN 202111366030A CN 114330486 A CN114330486 A CN 114330486A
Authority
CN
China
Prior art keywords
data
model
current section
wgan
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111366030.0A
Other languages
Chinese (zh)
Inventor
臧海祥
郭镜玮
赵佳伟
黄蔓云
卫志农
陈�胜
孙国强
周亦洲
韩海腾
朱瑛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202111366030.0A priority Critical patent/CN114330486A/en
Publication of CN114330486A publication Critical patent/CN114330486A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an improved Wasserstein GAN-based bad data identification method for a power system, which comprises the following steps: screening and preprocessing historical measurement data only containing Gaussian white noise in a historical database; training a WGAN-GP model by using the preprocessed historical measurement data as target data; respectively establishing loss functions of a generator and a discriminator, carrying out game training, and obtaining a WGAN-GP model after the training is finished; carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the data into a trained WGAN-GP model to obtain measurement reconstruction data of the current section; and obtaining a reconstruction error of the current section based on the measurement reconstruction data and the real-time measurement data of the current section, and inputting the reconstruction error into the trained C4.5 decision tree model to identify bad data of the measurement data of the current section. The method and the device can acquire the measurement reconstruction error of the current section based on the real-time measurement information reconstruction, can quickly and accurately identify bad data, and can ensure the identification performance and simultaneously consider the identification efficiency.

Description

Power system bad data identification method based on improved Wasserstein GAN
Technical Field
The invention relates to a bad data identification method for an electric power system based on an improved Wasserstein GAN, and belongs to the technical field of electric power systems.
Background
With the proposal of the target of 30-60 double carbon, the large amount of new energy is connected to the grid, so that the data amount required to be processed by the power system is exponentially increased, and meanwhile, the data structure of the power system is more and more complex, thereby putting higher requirements on the reliability, safety and stability of the system operation. The state perception is one of core functions of an energy management system of the power system, has important significance on planning and operation of the power system, and provides a reliable database for real-time scheduling and subsequent high-level application and analysis of a series of power systems. In the actual operation process of the power grid, except for normal data noise, the measurement information acquired by each information acquisition unit inevitably generates bad data, and the existence of the bad data not only makes the state estimation of the power system difficult to reflect the real state of the system, but also causes troubles to the scheduling of the power system. Therefore, bad data identification has important significance for state estimation and power grid state analysis.
At present, for the problem of large measurement deviation, the bad data identification method can be divided into a traditional power system state estimation method and a bad data identification method based on data driving. The traditional power system state estimation method is characterized in that a physical model is used for iteration of estimation, detection, identification and re-estimation, although a relatively accurate detection result can be obtained, the method is low in operation speed and prone to erroneous judgment and missing judgment along with system increase. Therefore, in order to improve the identification efficiency and the identification precision at the same time, many scholars propose a bad data identification method based on data driving, and the most common method is a clustering method and a deep learning method. The improved FCM clustering method is provided by scholars, clustering analysis is summarized into a nonlinear optimization problem with constraint conditions, and compared with other clustering algorithms, the FCM algorithm has the advantages of simple design, wide application range and the like, but is sensitive to an initial value of the algorithm and easy to fall into local optimization. In the aspect of deep learning, a learner provides a bad data identification and algorithm for an electric power system based on a deep learning network, the algorithm takes collected electric power data as input parameters, a multi-layer deep circulation learning network is constructed, and bad data is identified. The algorithm can identify the condition that single bad data and parameter errors exist simultaneously, but the identification effect is poor under the condition of a plurality of bad data. Therefore, in order to improve the identification performance and the identification efficiency of the bad data at the same time, the invention provides the method for identifying the bad data of the power system based on the improved Wasserstein GAN, so as to quickly and accurately identify the bad data.
Disclosure of Invention
The invention provides a method for identifying bad data of an electric power system based on an improved Wassertein GAN, aiming at the problems of low identification accuracy and poor identification efficiency of the bad data of the large-scale electric power system.
The invention specifically adopts the following technical scheme to solve the technical problems:
the invention provides an improved Wasserstein GAN-based bad data identification method for a power system, which comprises the following steps:
step (1): screening historical measurement data only containing Gaussian white noise in a historical database, and performing data preprocessing on the screened historical measurement data to obtain preprocessed historical measurement data;
step (2): training a WGAN-GP model by using the preprocessed historical measurement data as target data; in the training process of the WGAN-GP model, a generator inputs Gaussian noise which has the same dimensionality as target data and meets standard normal distribution, and the generator converts the Gaussian noise into pseudo data which is similar to the target data distribution; the discriminator is responsible for distinguishing true and false data from target data and generated pseudo data, and the pseudo data obtained by distinguishing are used as reconstruction data output by the whole model;
and (3): respectively establishing loss functions of a generator and a discriminator in the WGAN-GP model, carrying out game training on the generator and the discriminator, and obtaining the WGAN-GP model which is suitable for the distribution of measurement data only containing Gaussian noise after the training is finished;
and (4): carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the preprocessed real-time measurement data of the current section into the WGAN-GP model trained in the step (3) to obtain measurement reconstruction data of the current section;
and (5): and (4) obtaining a reconstruction error of the current section based on the measurement reconstruction data of the current section obtained in the step (4) and the real-time measurement data of the current section, inputting the obtained reconstruction error of the current section into a trained C4.5 decision tree model, determining a threshold of bad data based on a bad data threshold setting method of the C4.5 decision tree model, and identifying the bad data of the real-time measurement data of the current section by combining the threshold of the bad data and the reconstruction error of the current section.
Further, as a preferred technical solution of the present invention, the data preprocessing is performed on the screened historical measurement data in the step (1), and specifically, the data preprocessing is performed by:
for the ith measured value of the kth section, carrying out data normalization on the measured value, wherein the calculation formula is as follows:
Figure BDA0003361396870000021
wherein x isi *Is the ith normalized measurement value, xiIs the ith original measurement.
Further, as a preferred technical solution of the present invention, in the WGAN-GP model in step (3), a Wasserstein optimization objective function is adopted, and a penalty term is introduced, where Wasserstein distance is expressed as:
Figure BDA0003361396870000031
wherein, W (P)r,Pg) Represents the desired infimum bound of γ (x, y); II (P)r,Pg) Is represented by Pg(. o) and Pr() a set of joint probability distributions γ for the edge distributions; w (P) is determined by Kantorovich-Rubinstein severity lawr,Pg) Expressed as:
Figure BDA0003361396870000032
wherein | D (I) | is less than or equal to 1;
in order to enable the model to be always kept as a Lipschitz continuous function, a penalty term is introduced on the basis of the original loss function, and the final loss function of the WGAN-GP model is obtained and expressed as follows:
Figure BDA0003361396870000033
wherein | · | purple sweetpRepresents a p-norm; λ represents a penalty term coefficient;
Figure BDA0003361396870000034
wherein ε ∈ U [0,1]]U is uniformly distributed;
Figure BDA0003361396870000035
is composed of
Figure BDA0003361396870000036
The distribution of (a);
and, determining the loss function of the generator in the WGAN-GP model is expressed as: l isG=1-D(G(z));
And, the loss function for determining the discriminators in the WGAN-GP model is expressed as:
Figure BDA0003361396870000037
wherein, Pr() represents a distribution of the target data; x represents a target data set; e (-) is the expectation function; g (-) represents a generator function; d (-) represents a discriminator function; pg(-) represents a noisy data distribution; z represents the input noisy data vector;
and, during the WGAN-GP model training process, selecting the optimizer Adam to iterate L (G, D) and LGTo optimize the parameters of the arbiter and generator, respectively.
Further, as a preferred technical solution of the present invention, the reconstruction error of the current cross section obtained in the step (5) is calculated by using the following formula:
Figure BDA0003361396870000038
wherein the content of the first and second substances,
Figure BDA0003361396870000041
reconstructing data for the t measurement in the k section, xtFor the t-th real-time measured data, eta, in the k-th cross sectiontThe reconstruction error of the section is obtained.
Further, as a preferred technical solution of the present invention, the training process of the C4.5 decision tree model in the step (5) specifically includes:
firstly, a calculation formula for determining the sample information entropy in the C4.5 decision tree model is as follows:
Figure BDA0003361396870000042
where D represents the test set of the C4.5 decision tree model, piRepresenting the proportion of certain data in the sample;
second, the information gain and gain ratio of the error signature are calculated, assuming signature eMAs a division feature, there are v values arranged from small to large
Figure BDA0003361396870000043
Each interval
Figure BDA0003361396870000044
) Middle point of
Figure BDA0003361396870000045
As candidate partition points, there are v-1 such partition points, which are collectively denoted as:
Figure BDA0003361396870000046
wherein Q represents all candidate partition point sets; q. q.siRepresents the ith candidate partition point; the information gain of each candidate division point is calculated as:
Figure BDA0003361396870000047
wherein, | D | represents the number of training samples:
Figure BDA0003361396870000048
to represent
Figure BDA0003361396870000049
The number is the proportion;
Figure BDA00033613968700000410
to represent
Figure BDA00033613968700000411
The number is the proportion; selecting the largest information gain in all candidate division points as the information gain of the error characteristics, wherein the gain rate needs to be calculated as the final division standard due to the characteristics that the information gain is heavier and the number of the values can be more, and the gain rate calculation formula of the error characteristics is as follows:
R(D,eM,qi)=Z(D,eM,qi)/I(eM)
Figure BDA00033613968700000412
wherein R (-) is the gain ratio of the error feature; i (e)M) Is characterized byMBeta is a sign parameter indicating whether it is positive or negative.
Further, as a preferred technical solution of the present invention, in the step (5), the identifying of the bad data is performed on the real-time measurement data of the current cross section, specifically:
firstly, a threshold value meeting the requirement of stopping splitting is given, when the maximum information gain of the calculated error characteristics is smaller than the threshold value, the characteristics with better classification capability are not found, and at the moment, the real-time measurement data of all current sections are judged to be normal data; when the calculated maximum information gain of the error characteristics is larger than the threshold value, dividing; the dividing process comprises the following steps: suppose that
Figure BDA0003361396870000051
As a division point, the information gain rate is maximized if
Figure BDA0003361396870000052
Dividing the real-time measurement data of the current section into bad data, dividing the rest real-time measurement data of the current section into one type, and then adopting another intermediate node
Figure BDA0003361396870000053
As division standard to residueDividing the rest real-time measurement data of the current section if the data are divided
Figure BDA0003361396870000054
And judging the data to be normal data, otherwise, judging the data to be bad data, and repeating recursion in sequence until the divided data belong to the same category, so that the bad data identification can be realized.
By adopting the technical scheme, the invention can produce the following technical effects:
compared with the prior art, the method for identifying the bad data of the power system based on the improved Wasserstein GAN has the following advantages:
(1) the method carries out data reconstruction on the real-time measured data of the current section based on the improved Wasserstein GAN model, carries out data reconstruction on the real-time measured data of the current section through game training of a generator and a discriminator, and the reconstructed data is as consistent as possible with the target data distribution, so that the outlier data in a group of measured data can be conveniently found. Compared with the existing algorithm, the WGAN-GP model used by the invention adds a penalty term, thereby avoiding the problems that the original WGAN model is easy to generate gradient explosion and is difficult to converge.
(2) The method is mainly used for solving the problems of poor identification performance and low identification efficiency of bad data of the large power grid, based on real-time measured data reconstruction of the current section, and positioning the bad data by using a decision tree model. In addition, in order to avoid subjectivity of setting the bad data threshold and prevent missing judgment and erroneous judgment, a bad data threshold determination method based on a decision tree model is provided. Finally, the feasibility and accuracy of the method are verified through a large amount of simulation and actual measurement data, and the method has important significance for improving the quality of the measured data.
Therefore, the method can reconstruct the real-time measurement information of a certain section by using the improved Wasserstein GAN, thereby obtaining the measurement reconstruction error of the current section, accurately detecting the position of bad data in a group of real-time measurement information, further quickly and accurately identifying the bad data, ensuring the identification performance and simultaneously considering the identification efficiency.
Drawings
Fig. 1 is a schematic flow chart of a method for identifying bad data of an electric power system based on an improved Wasserstein GAN according to the present invention.
Fig. 2 illustrates the location of bad data in an IEEE118 node system according to the present invention.
FIG. 3 is a schematic diagram of an improved Wasserstein GAN model in the present invention.
FIG. 4 is a schematic diagram of the C4.5 decision tree model according to the present invention.
FIG. 5 is a graph of the test effect of the WGAN-GP model reconstruction data when single bad data is contained in the invention.
FIG. 6 is a graph showing the effect of the WGAN-GP model on data reconstruction when a plurality of bad data are contained in the test result.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
As shown in fig. 1, the present invention provides a method for identifying bad data of an electric power system based on an improved Wasserstein GAN, wherein the bad data location is shown in fig. 2. The method comprises the following steps:
step (1): and screening historical measurement data only containing white Gaussian noise in a historical database, and performing data preprocessing on the screened historical measurement data to obtain preprocessed historical measurement data, so that target data required by model training is obtained, preliminary analysis processing on the measurement data is realized, and a training set of the WGAN-GP model is constructed.
The data preprocessing of the screened historical measurement data is as follows:
for the ith measurement value of the kth section, carrying out data normalization on the measurement value, wherein the specific formula is as follows:
Figure BDA0003361396870000061
wherein x isi *Is the ith normalized measurement value, xiIs the ith original measurement.
By utilizing the formula, the measurement values can be distributed between [0 and 1], so that the influence of the data magnitude on the model training is avoided. The SCADA measurement configuration generally comprises a node voltage amplitude and branch head and tail end power, so that the amplitude and the power are respectively modeled, and the model precision and the model training efficiency can be improved.
Step (2): training a WGAN-GP model by using the historical measurement data preprocessed in the step (1) as target data; in the training process of the WGAN-GP model, a generator inputs Gaussian noise which has the same dimensionality as target data and meets standard normal distribution, and the generator converts the Gaussian noise into pseudo data which is similar to the target data distribution; and the discriminator is responsible for distinguishing true data from false data generated from the target data and taking the distinguished false data as reconstructed data output by the whole model.
The schematic structural diagram of the WGAN-GP model is shown in fig. 3, and includes a generator and a discriminator. The generator network structure comprises a full-connection layer and three convolution layers, wherein the first three layers adopt LeakyReLU activation functions, and the last layer is a Tanh activation function; the discriminator model network structure includes three convolutional layers and a fully connected layer, the first three layers using the LeakyReLU activation function.
And (3): respectively establishing loss functions of a generator and a discriminator in the WGAN-GP model, carrying out game training on the generator and the discriminator to enable the generator and the discriminator to reach Nash equilibrium, obtaining the WGAN-GP model which is suitable for only containing Gaussian noise measurement data distribution after the training is finished, and fully learning essential characteristics of target data by the discriminator after the training is finished at the moment, wherein the method comprises the following specific steps:
firstly, the principle of the Wasserstein generative countermeasure network provided by the invention and containing the penalty term is as follows:
for the generation-oriented countermeasure network (GAN) in the prior art, it is an unsupervised deep learning model. The model framework mainly comprises two modules: a generator (genetic Model) and a discriminator (Discriminative Model), wherein input data of the generator is gaussian noise, and a loss function of the generator can be expressed as:
Figure BDA0003361396870000071
wherein E (-) is an expectation function; g (-) represents a generator function; d (-) represents a discriminator function; pg(-) represents a noisy data distribution; z represents the input noisy data vector.
The input data of the discriminator is the pseudo data generated by the generator and the target data, the discriminator is mainly responsible for distinguishing the true and false of the target data and the pseudo data, and the loss function can be expressed as:
Figure BDA0003361396870000072
wherein, Pr() represents a distribution of the target data; x represents a target data set. The objective function of the game with the generator and the discriminator can be established by the two loss functions, and can be expressed as:
Figure BDA0003361396870000073
on the basis, the generator and the discriminator carry out alternating iterative game training, the capacities of the generator and the discriminator are stronger and stronger in the training process, and the generator and the discriminator can reach Nash balance theoretically.
The WGAN-GP model adopts Wasserstein to optimize the objective function and introduces a penalty term, so that the phenomena of gradient loss and mode collapse of the original GAN are effectively solved, the stability of the model is obviously improved, and the problem of non-convergence of the model caused by weight pruning of the WGAN is solved by introducing the penalty factor. Its Wasserstein distance can be expressed as:
Figure BDA0003361396870000074
wherein, W (P)r,Pg) Represents the desired infimum bound of γ (x, y); II (P)r,Pg) Is represented by Pg(. o) and Pr(. is) a set of joint probability distributions γ for the edge distributions. W (P) can be expressed by Kantorovich-Rubinstein duality lawr,Pg) Expressed as:
Figure BDA0003361396870000081
wherein, | d (i) | ≦ 1 constraint is to prevent the output of the discriminator from changing significantly when the data slightly changes. In order to enable the discriminator model to be always kept as a Lipschitz continuous function, a penalty term is introduced on the basis of the original loss function, so that the loss function during final WGAN-GP model training can be expressed as:
Figure BDA0003361396870000082
wherein | · | purple sweetpRepresents a p-norm; λ represents a penalty term coefficient;
Figure BDA0003361396870000083
wherein ε ∈ U [0,1]]U is uniformly distributed;
Figure BDA0003361396870000084
is composed of
Figure BDA0003361396870000085
Distribution of (2).
Also, it can be determined that the loss function of the generator in the WGAN-GP model of the present invention is LG=1-D(G(z));
Compared with the WGAN model in the prior art, the loss function of the discriminator in the WGAN-GP model of the invention has no change, and the loss function is as follows:
Figure BDA0003361396870000086
finally, the optimizer Adam is selected during the WGAN-GP model training process of the invention by iterating L (G, D) and LGTo optimize the parameters of the arbiter and generator, respectively.
And (4): and (4) carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the preprocessed real-time measurement data of the current section into the WGAN-GP model trained in the step (3) to obtain measurement reconstruction data of the current section.
And (5): obtaining a reconstruction error of the current section based on the measurement reconstruction data of the current section obtained in the step (4) and the real-time measurement data of the current section, inputting the obtained reconstruction error of the current section into a trained C4.5 decision tree model, determining a threshold of bad data based on a bad data threshold setting method of the C4.5 decision tree model, and then identifying the bad data of the real-time measurement data of the current section by combining the threshold of the bad data and the reconstruction error of the current section, wherein the method specifically comprises the following steps:
firstly, according to the measurement reconstruction data of the current section and the real-time measurement data of the current section, calculating to obtain the reconstruction error of the current section, and adopting a calculation formula as follows:
Figure BDA0003361396870000091
wherein the content of the first and second substances,
Figure BDA0003361396870000092
reconstructing data for the t measurement in the k section, xtFor the t-th real-time measured data, eta, in the k-th cross sectiontThe reconstruction error of the section is obtained.
Then, for the C4.5 decision tree model, which is an extension and optimization of the ID3 algorithm, the calculation formula of the sample information entropy is as follows:
Figure BDA0003361396870000093
where D represents the test set of the C4.5 decision tree model, piThe proportion of certain data in the sample is shown, wherein the proportion of normal data in the sample is shown by i-1, and the proportion of poor data in the sample is shown by i-2.
Second, the information gain and gain ratio of the error signature are calculated, assuming signature eMAs a division feature, there are v values arranged from small to large
Figure BDA0003361396870000094
Each interval
Figure BDA0003361396870000095
Middle point of
Figure BDA0003361396870000096
As candidate partition points, there are v-1 such partition points, which are collectively denoted as:
Figure BDA0003361396870000097
wherein Q represents all candidate partition point sets; q. q.siRepresenting the ith candidate partition point. The information gain of each candidate partition point is:
Figure BDA0003361396870000098
wherein, | D | represents the number of training samples:
Figure BDA0003361396870000099
to represent
Figure BDA00033613968700000910
The number is the proportion;
Figure BDA00033613968700000911
to represent
Figure BDA00033613968700000912
The number is the proportion. At this time, the information gain with the largest information gain rate in all candidate quantiles is selected as the information gain of the error feature, and because the information gain is heavier and has more dereferencing features, the gain rate needs to be calculated as the final division standard, and the calculation formula is as follows:
R(D,eM,qi)=Z(D,eM,qi)/I(eM)
Figure BDA00033613968700000913
wherein R (-) is the gain ratio of the error feature; i (e)M) Is characterized byMThe greater the number of possible values, I (e)M) The larger the value is; β is a sign parameter indicating whether it is positive or negative.
Finally, determining the threshold of the bad data by the bad data threshold setting method based on the C4.5 decision tree model, and identifying the bad data of the real-time measured data of the current section by combining the threshold of the bad data and the reconstruction error of the current section, as shown in FIG. 4, taking the input reconstruction error D of the current section as a root node and an intermediate node
Figure BDA0003361396870000101
Expressing the corresponding error feature division standard in the training set, expressing the division result as normal data or bad data by leaf nodes, and specifically adopting the following identification process:
firstly, a threshold value meeting the requirement of stopping splitting is given, when the maximum information gain of the calculated error characteristics is smaller than the threshold value, the characteristics with better classification capability are not found, and at the moment, the real-time measurement data of all current sections are judged to be normal data; if the calculated maximum information gain of the error characteristics is larger than the threshold value, dividing, otherwise, ending the division; the dividing process comprises the following steps: suppose that
Figure BDA0003361396870000102
The maximum information gain rate as a division point, if the error characteristic
Figure BDA0003361396870000103
Dividing the real-time measurement data of the current section into bad data, dividing the rest real-time measurement data of the current section into one type, and then adopting another intermediate node
Figure BDA0003361396870000104
Dividing the real-time measurement data of the remaining current section as a division standard, if so, dividing
Figure BDA0003361396870000105
And further judging the data to be normal data, otherwise judging the data to be bad data, and repeating recursion in sequence until the divided data belong to the same category, thereby realizing the identification of the bad data. Therefore, the decision tree is equivalent to a two-classification model, and the classified data types comprise normal data and bad data, so that the purpose of identifying the bad data is achieved.
Therefore, based on the above steps, the method of the present invention reconstructs the measurement information of a certain cross section by using the improved Wasserstein GAN, so as to accurately detect the position of the bad data in a group of measurement information, and can simultaneously improve the identification performance and the identification efficiency of the bad data.
In order to verify the superiority of the method of the present invention, simulation test and detailed description are performed on the IEEE118 node system. The measurement configuration of the example is active and reactive power measurement of the head and tail ends of the branch and node voltage amplitude measurement according to the measurement configuration of the actual power system state estimation.
In the simulation, the load curve of the actual power system is utilized to carry out simulation to obtain multi-section tidal current data as a true value, Gaussian white noise is added on the basis of tidal current to simulate normal measurement data, the bad data condition occurring in the operation process of the actual power system is simulated by increasing or reducing the normal power measurement data by 50-150 percent and increasing or reducing the normal voltage amplitude measurement by 15-25 percent,and data is processed according to the following 6: 4, the invention is realized by Python programming, a tested hardware platform is based on a PC, and a processor is an Intel @ CoreTMi7-8700K CPU @3.70GHz and the memory is 16 GB. Therefore, the identification performance of the method for identifying the bad data of the power system with the improved Wasserstein GAN provided by the invention is tested.
TABLE 1IEEE118 node System bad data configuration
Figure BDA0003361396870000111
The method is characterized in that an example simulation test is carried out on the basis of an IEEE118 node system, wherein an offline training sample is a historical measurement data set formed by 3000 section data, and the WGAN-GP model is trained offline. 10% of the samples in the test set were selected to add bad data at random locations. The bad data configuration of the real-time metrology data is shown in table 1.
The bad data configuration No. 1-9 is related bad data conditions caused by measurement residual pollution appearing in an actual power grid, and the bad data configuration No. 10-14 is single bad data conditions collected in the actual power system operation process. The various bad data are added into the test set for model verification, and the WGAN-GP model is utilized to reconstruct the data of the real-time measurement data as shown in Table 2.
TABLE 2WGAN-GP model test Effect
Figure BDA0003361396870000121
As can be seen from table 2, the WGAN-GP model trained for the real data set based on normal measurement can reconstruct data of the real-time measurement data, the reconstructed data is closer to the real value of the system load flow than the original bad data, and the difference value is between [ -0.01, 0.01], and belongs to the normal measurement data. And inputting the real-time measurement data into the trained WGAN-GP model to obtain measurement reconstruction data of the current section, and further obtain a reconstruction error of the current section. The calculation formula of the reconstruction error of the current section is as follows:
Figure BDA0003361396870000122
wherein the content of the first and second substances,
Figure BDA0003361396870000123
reconstructing data for the t measurement in the k section, xtFor the t-th real-time measured data, eta, in the k-th cross sectiontThe reconstruction error of the section is obtained.
And then, inputting the reconstruction error of the current section into the trained C4.5 decision tree model, and identifying bad data of the group of measured data. The invention is tested based on the measurement data of an IEEE118 node system, wherein the measurement configuration comprises a node voltage amplitude value and active power and reactive power of the head end and the tail end of a branch circuit. The identification results are shown in tables 3 and 4. The calculation formulas of the missed detection rate and the false detection rate are as follows:
Figure BDA0003361396870000131
Figure BDA0003361396870000132
wherein x isbadThe number of the undetected bad data is counted; x is the number ofmThe number of the measured data which are originally normal data but are falsely detected as bad data; x is the number oftatalThe total number of the measurements; eta1The omission factor of the section is shown; eta2The false detection rate of the cross section is shown.
TABLE 3 identification of various methods under various bad data
Figure BDA0003361396870000133
TABLE 4 false negative and false positive rates for different methods
Figure BDA0003361396870000134
As can be seen from table 3, when the number of bad data increases, the traditional bad data identification method returns the undetected rate and the false detection rate of different degrees, and although the FCM clustering method and the SVM neural network algorithm are improved over the traditional method, the more serious undetected rate still occurs when the number of bad data increases. However, the method provided by the invention does not have a missing detection rate when the bad data is gradually increased, only a certain proportion of false detections occur, but the method provided by the invention is greatly improved in the aspect of identification effect.
The invention discloses a data-driven bad data identification method of a large-scale power system, which is constructed by the invention and aims to solve the problem of overlong bad data identification and calculation time. Table 4 shows the calculated time for various test systems for different bad data identification methods. In order to visually show the calculation efficiency of the invention, the number of the bad data in different test systems is set to be 10, wherein the voltage amplitude is 2, and the branch power is 8. As can be seen from table 5, as the system scale increases, the calculation time of the conventional bad data identification method increases significantly, and especially when the system increases to 13659 nodes, the conventional bad data identification method cannot meet the requirement of online identification; compared with the FCM clustering method and the SVM neural network-based method, the method disclosed by the invention has the advantage that the calculation efficiency is improved.
TABLE 5 calculation times under different algorithms
Figure BDA0003361396870000141
The main innovation point and key point of the invention are the performance of the WGAN-GP model data reconstruction, and in order to better reflect the performance of the model, fig. 5 and 6 are measurement data graphs of single bad data reconstruction and multiple bad data reconstruction respectively. The system tested in the figure is an IEEE118 node system, where the measurement configuration includes node voltage magnitude, branch head and tail end active and reactive power.
Fig. 5 is a diagram showing a result of real-time measurement, true trend value and reconstructed data of a section of the IEEE118 node system. Wherein the shaded portion is the normal measurement range. As can be seen from fig. 5, when there is a single bad data in the measured data, the reconstructed data generated by the WGAN-GP model proposed by the present invention is within the normal measurement range, and the bad data can be treated as "outlier data", which is more convenient for the decision tree to identify the bad data.
As shown in fig. 6, a graph of the testing effect of the WGAN-GP when the real-time measurement data includes a plurality of bad data is shown, wherein the shaded area is the normal measurement range. As can be seen from fig. 6, when there are a plurality of bad data due to residual contamination, the reconstructed data generated by the WGAN-GP model proposed by the present invention is still within the normal measurement range, and the bad data position can be located by inputting the reconstruction error into the decision tree model.
In conclusion, the method can be used for solving the problems of low identification efficiency of bad data and false detection and missed detection of a large-scale power system, generating pseudo data which is closest to real data distribution by utilizing the game idea of the WGAN-GP model, further identifying the outlier data in the generated data, reconstructing the data of the real-time measured data of a certain section to obtain the measured reconstruction data of the current section, calculating the reconstruction error to identify the bad data of the real-time measured data of the section, and can be used for quickly and accurately identifying the bad data, thereby ensuring the identification performance and simultaneously considering the identification efficiency.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (6)

1. The method for identifying the bad data of the power system based on the improved Wasserstein GAN is characterized by comprising the following steps of:
step (1): screening historical measurement data only containing Gaussian white noise in a historical database, and performing data preprocessing on the screened historical measurement data to obtain preprocessed historical measurement data;
step (2): training a WGAN-GP model by using the preprocessed historical measurement data as target data; in the training process of the WGAN-GP model, a generator inputs Gaussian noise which has the same dimensionality as target data and meets standard normal distribution, and the generator converts the Gaussian noise into pseudo data which is similar to the target data distribution; the discriminator is responsible for distinguishing true and false data from target data and generated pseudo data, and the pseudo data obtained by distinguishing are used as reconstruction data output by the whole model;
and (3): respectively establishing loss functions of a generator and a discriminator in the WGAN-GP model, carrying out game training on the generator and the discriminator, and obtaining the WGAN-GP model which is suitable for the distribution of measurement data only containing Gaussian noise after the training is finished;
and (4): carrying out data preprocessing on the collected real-time measurement data of the current section, and inputting the preprocessed real-time measurement data of the current section into the WGAN-GP model trained in the step (3) to obtain measurement reconstruction data of the current section;
and (5): and (4) obtaining a reconstruction error of the current section based on the measurement reconstruction data of the current section obtained in the step (4) and the real-time measurement data of the current section, inputting the obtained reconstruction error of the current section into a trained C4.5 decision tree model, determining a threshold of bad data based on a bad data threshold setting method of the C4.5 decision tree model, and identifying the bad data of the real-time measurement data of the current section by combining the threshold of the bad data and the reconstruction error of the current section.
2. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the step (1) of performing data preprocessing on the screened historical measurement data specifically comprises:
for the ith measured value of the kth section, carrying out data normalization on the measured value, wherein the calculation formula is as follows:
Figure FDA0003361396860000011
wherein x isi *Is the ith normalized measurement value, xiIs the ith original measurement.
3. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein said WGAN-GP model in step (3) adopts a Wasserstein optimization objective function and introduces penalty term, and the Wasserstein distance is expressed as:
Figure FDA0003361396860000012
wherein, W (P)r,Pg) Represents the desired infimum bound of γ (x, y); II (P)r,Pg) Is represented by Pg(. o) and Pr() a set of joint probability distributions γ for the edge distributions; w (P) is determined by Kantorovich-Rubinstein severity lawr,Pg) Expressed as:
Figure FDA0003361396860000021
and introducing a penalty term to obtain a final loss function of the WGAN-GP model, wherein the final loss function is expressed as:
Figure FDA0003361396860000022
wherein | · | purple sweetpRepresents a p-norm; x represents a target data set, and lambda represents a penalty term coefficient;
Figure FDA0003361396860000023
wherein ε ∈ U [0,1]]U is uniformly distributed;
Figure FDA0003361396860000029
is composed of
Figure FDA0003361396860000024
The distribution of (a);
and, determining the loss function of the generator in the WGAN-GP model is expressed as: l isG=1-D(G(z));
And, the loss function for determining the discriminators in the WGAN-GP model is expressed as:
Figure FDA0003361396860000025
wherein, Pr() represents a distribution of the target data; e (-) is the expectation function; g (-) represents a generator function; d (-) represents a discriminator function; pg(-) represents a noisy data distribution; z represents the input noisy data vector;
and, during the WGAN-GP model training process, selecting the optimizer Adam to iterate L (G, D) and LGTo optimize the parameters of the arbiter and generator, respectively.
4. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the reconstruction error of the current section obtained in the step (5) is calculated as follows:
Figure FDA0003361396860000026
wherein the content of the first and second substances,
Figure FDA0003361396860000027
reconstructing data for the t measurement in the k section, xtFor the t-th real-time measured data, eta, in the k-th cross sectiontThe reconstruction error of the section is obtained.
5. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 1, wherein the training process of the C4.5 decision tree model in the step (5) comprises:
firstly, a calculation formula for determining the sample information entropy in the C4.5 decision tree model is as follows:
Figure FDA0003361396860000028
where D represents the test set of the C4.5 decision tree model, piRepresenting the proportion of certain data in the sample;
second, the information gain and gain ratio of the error signature are calculated, assuming signature eMAs a division feature, there are v values arranged from small to large
Figure FDA0003361396860000031
Each interval
Figure FDA0003361396860000032
Middle point of
Figure FDA0003361396860000033
As candidate partition points, there are v-1 such partition points, which are collectively denoted as:
Figure FDA0003361396860000034
wherein Q represents all candidate partition point sets; q. q.siRepresents the ith candidate partition point; the information gain of each candidate division point is calculated as:
Figure FDA0003361396860000035
wherein, | D | represents the number of training samples:
Figure FDA0003361396860000036
to represent
Figure FDA0003361396860000037
The number is the proportion;
Figure FDA0003361396860000038
to represent
Figure FDA0003361396860000039
The number is the proportion; selecting the largest information gain in all candidate division points as the information gain of the error characteristics, wherein the gain rate needs to be calculated as the final division standard due to the characteristics that the information gain is heavier and the number of the values can be more, and the gain rate calculation formula of the error characteristics is as follows:
R(D,eM,qi)=Z(D,eM,qi)/I(eM)
Figure FDA00033613968600000310
wherein R (-) is the gain ratio of the error feature; i (e)M) Is characterized byMBeta is a sign parameter indicating whether it is positive or negative.
6. The improved Wasserstein GAN-based power system bad data identification method as claimed in claim 5, wherein the step (5) of identifying the bad data from the real-time measurement data of the current section specifically comprises:
firstly, a threshold value meeting the requirement of stopping splitting is given, when the maximum information gain of the calculated error characteristics is smaller than the threshold value, the characteristics with better classification capability are not found, and at the moment, the real-time measurement data of all current sections are judged to be normal data; when the calculated maximum information gain of the error characteristics is larger than the threshold value, dividing; the dividing process comprises the following steps: suppose that
Figure FDA00033613968600000311
As a division point, the information gain rate is maximized if
Figure FDA00033613968600000312
Dividing the real-time measurement data of the current section into bad data, dividing the rest real-time measurement data of the current section into one type, and then adopting another intermediate node
Figure FDA00033613968600000313
Dividing the real-time measurement data of the remaining current section as a division standard, if so, dividing
Figure FDA00033613968600000314
And judging the data to be normal data, otherwise, judging the data to be bad data, and repeating recursion in sequence until the divided data belong to the same category, so that the bad data identification can be realized.
CN202111366030.0A 2021-11-18 2021-11-18 Power system bad data identification method based on improved Wasserstein GAN Pending CN114330486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111366030.0A CN114330486A (en) 2021-11-18 2021-11-18 Power system bad data identification method based on improved Wasserstein GAN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111366030.0A CN114330486A (en) 2021-11-18 2021-11-18 Power system bad data identification method based on improved Wasserstein GAN

Publications (1)

Publication Number Publication Date
CN114330486A true CN114330486A (en) 2022-04-12

Family

ID=81047592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111366030.0A Pending CN114330486A (en) 2021-11-18 2021-11-18 Power system bad data identification method based on improved Wasserstein GAN

Country Status (1)

Country Link
CN (1) CN114330486A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146538A (en) * 2022-07-11 2022-10-04 河海大学 Power system state estimation method based on message passing graph neural network
CN117056714A (en) * 2023-02-15 2023-11-14 上海交通大学 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491404A (en) * 2018-01-22 2018-09-04 国电南瑞科技股份有限公司 A kind of state estimation bad data recognition method based on BP neural network
CN109144987A (en) * 2018-08-03 2019-01-04 天津相和电气科技有限公司 Electric system based on deep learning measures missing values method for reconstructing and its application
CN109165504A (en) * 2018-08-27 2019-01-08 广西大学 A kind of electric system false data attack recognition method generating network based on confrontation
CN111723851A (en) * 2020-05-30 2020-09-29 同济大学 Production line fault detection method
CN112465798A (en) * 2020-12-11 2021-03-09 上海交通大学 Anomaly detection method based on generation countermeasure network and memory module
CN112836830A (en) * 2021-02-01 2021-05-25 广西师范大学 Method for voting and training in parallel by using federated gradient boosting decision tree
CN112989710A (en) * 2021-04-22 2021-06-18 苏州联电能源发展有限公司 Industrial control sensor numerical value abnormity detection method and device
CN113127705A (en) * 2021-04-02 2021-07-16 西华大学 Heterogeneous bidirectional generation countermeasure network model and time sequence anomaly detection method
CN113591944A (en) * 2021-07-14 2021-11-02 中国海洋大学 Parameter selection optimization method, system and equipment in random forest model training

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491404A (en) * 2018-01-22 2018-09-04 国电南瑞科技股份有限公司 A kind of state estimation bad data recognition method based on BP neural network
CN109144987A (en) * 2018-08-03 2019-01-04 天津相和电气科技有限公司 Electric system based on deep learning measures missing values method for reconstructing and its application
CN109165504A (en) * 2018-08-27 2019-01-08 广西大学 A kind of electric system false data attack recognition method generating network based on confrontation
CN111723851A (en) * 2020-05-30 2020-09-29 同济大学 Production line fault detection method
CN112465798A (en) * 2020-12-11 2021-03-09 上海交通大学 Anomaly detection method based on generation countermeasure network and memory module
CN112836830A (en) * 2021-02-01 2021-05-25 广西师范大学 Method for voting and training in parallel by using federated gradient boosting decision tree
CN113127705A (en) * 2021-04-02 2021-07-16 西华大学 Heterogeneous bidirectional generation countermeasure network model and time sequence anomaly detection method
CN112989710A (en) * 2021-04-22 2021-06-18 苏州联电能源发展有限公司 Industrial control sensor numerical value abnormity detection method and device
CN113591944A (en) * 2021-07-14 2021-11-02 中国海洋大学 Parameter selection optimization method, system and equipment in random forest model training

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何强等: "负载不平衡下小样本数据的轴承故障诊断", 《中国机械工程》 *
杨智伟等: "基于长短期记忆网络的PMU 不良数据检测方法", 《电力系统保护与控制》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146538A (en) * 2022-07-11 2022-10-04 河海大学 Power system state estimation method based on message passing graph neural network
CN117056714A (en) * 2023-02-15 2023-11-14 上海交通大学 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Similar Documents

Publication Publication Date Title
CN110596492B (en) Transformer fault diagnosis method based on particle swarm optimization random forest model
CN106055918B (en) Method for identifying and correcting load data of power system
CN105512799B (en) Power system transient stability evaluation method based on mass online historical data
Yin et al. Wasserstein generative adversarial network and convolutional neural network (WG-CNN) for bearing fault diagnosis
CN111860982A (en) Wind power plant short-term wind power prediction method based on VMD-FCM-GRU
CN110542819B (en) Transformer fault type diagnosis method based on semi-supervised DBNC
CN110929847A (en) Converter transformer fault diagnosis method based on deep convolutional neural network
CN113702895B (en) Online quantitative evaluation method for error state of voltage transformer
CN114330486A (en) Power system bad data identification method based on improved Wasserstein GAN
CN110020712B (en) Optimized particle swarm BP network prediction method and system based on clustering
CN111722046A (en) Transformer fault diagnosis method based on deep forest model
CN105572572A (en) WKNN-LSSVM-based analog circuit fault diagnosis method
CN110672905A (en) CNN-based self-supervision voltage sag source identification method
CN111680875A (en) Unmanned aerial vehicle state risk fuzzy comprehensive evaluation method based on probability baseline model
CN111343147A (en) Network attack detection device and method based on deep learning
CN112200038B (en) CNN-based quick identification method for oscillation type of power system
CN112465124A (en) Twin depth space-time neural network model acquisition/fault diagnosis method and device
CN115021679A (en) Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
CN114266289A (en) Complex equipment health state assessment method
CN115526258A (en) Power system transient stability evaluation method based on Spearman correlation coefficient feature extraction
CN116842459B (en) Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning
CN113627674A (en) Distributed photovoltaic power station output prediction method and device and storage medium
CN113514743A (en) Construction method of GIS partial discharge pattern recognition system based on multi-dimensional features
CN116644348A (en) Cross-mechanical part fault diagnosis method and device based on transfer type countermeasure migration
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220412