CN117056714A - Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network - Google Patents

Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network Download PDF

Info

Publication number
CN117056714A
CN117056714A CN202310114842.9A CN202310114842A CN117056714A CN 117056714 A CN117056714 A CN 117056714A CN 202310114842 A CN202310114842 A CN 202310114842A CN 117056714 A CN117056714 A CN 117056714A
Authority
CN
China
Prior art keywords
pmu
data
sequence
identification
measurement sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310114842.9A
Other languages
Chinese (zh)
Inventor
严正
谢伟
徐潇源
方陈
朱彦名
刘舒
柳劲松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202310114842.9A priority Critical patent/CN117056714A/en
Publication of CN117056714A publication Critical patent/CN117056714A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a method, a system, equipment and a storage medium for identifying bad PMU data of an intelligent power distribution network. The method comprises the following steps: acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence; forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data; processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result; and performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence. The method based on the chart identifies the characteristics of normal and bad data, exerts the potential of dimension clustering and classifier, and improves the space-time correlation analysis efficiency and the identification sensitivity.

Description

Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a system, equipment and a storage medium for identifying bad PMU data of an intelligent power distribution network.
Background
And (3) hybrid clustering: the clustering operation is performed on the same data set by adopting a plurality of clustering methods.
Intelligent power distribution network: the intelligent power distribution network is one of key links of the intelligent power grid. Typically 10kV and below are part of the distribution network (20 kV in some areas), which is the part of the whole power system directly connected to the scattered users. The intelligent distribution network system integrates information of online data and offline data of a distribution network, data of the distribution network and user data, a power grid structure and a geographic graph by utilizing modern electronic technology, communication technology, computer and network technology, and realizes the intellectualization of monitoring, protection, control, electricity consumption and distribution management under the condition of normal operation and accidents of the distribution system.
PMU: the synchrophasor measurement device (PMU: phasor measurement unit) is a phasor measurement unit configured using Global Positioning System (GPS) second pulses as a synchronizing clock. The system can be used in the fields of dynamic monitoring, system protection, system analysis and prediction of a power system, and the like, and is an important device for guaranteeing the safe operation of a power grid. The PMU based on the GPS clock can measure voltage phase, current phase and other data of the junction point of the power system, and the data are transmitted to the monitoring master station through a communication network.
Reconstructing data: and the simulation measurement data which is generated by adopting algorithms such as an countermeasure generation network and the like and has the same distribution as the real measurement data is referred to.
Poor data: in addition to normal data, abnormal data obtained by PMU measurements also include missing data, outliers, and event data, where outliers and missing data are poor data due to poor measurement quality. The purpose of bad data identification is to classify outliers and missing data as bad data, while event data is classified as normal data by accurate measurement.
Challenge-generation network: abbreviated GAN. The GAN is a depth generation model, and is composed of a discrimination module and a generation module. In the training process, gaussian noise with the same dimensionality as the target data is input by the generator G, normal measurement information and pseudo data output by the generator are input by the discriminator D, the normal measurement information and the pseudo data output by the generator are alternately and iteratively trained to form game countermeasure, and finally the generator and the discriminator reach Nash equilibrium, and the generator outputs reconstruction measurement data.
PMU data is the basis for power system monitoring, control, and analysis, and therefore PMU data quality is critical, with significant impact on analysis results and even power system operational safety. However, PMU devices are complex in construction and susceptible to internal and external factors, resulting in a time series of PMU measurements containing poor or anomalous data. There is a need for poor data discrimination of PMU measurements in power systems and efforts to improve the quality of PMU measurements.
Outliers of PMU data refer to data points that deviate significantly from the expected measurements. In the absence of a system event, outliers typically shape like spikes that rise and fall suddenly. Event data is generated when system events such as switching events and sudden load changes occur. Typically, step event data is displayed as the PMU measurements change from a pre-event stage to a post-event stage. If the PMU measurements deviate very little before and after the transient period, the event data is spiked and the PMU measurement curve has similar abrupt and abnormal values. Bad data recognition methods based on checking for abrupt changes can easily identify abnormal values such as data peaks under static operating conditions. However, similar contours of outliers and spike data can lead to failure and inaccuracy of bad data identification.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a method, a system, equipment and a storage medium for identifying poor PMU data of an intelligent power distribution network.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, in an embodiment of the present invention, a method for identifying PMU bad data of an intelligent power distribution network is provided, including the following steps:
Acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;
forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data;
processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result;
and performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence.
As a further aspect of the present invention, before the step of obtaining the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence, the method further includes the steps of:
acquiring a sample data set, and training a GAN model by using the sample data set to obtain a trained GAN model; wherein the sample data set includes a sample PMU measurement sequence combination and corresponding sample two-dimensional image data.
As a further aspect of the present invention, the obtaining the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence includes:
and acquiring a PMU measurement sequence, and generating a corresponding measurement reconstruction sequence by utilizing a trained GAN model based on the PMU measurement sequence.
As a further aspect of the invention, the PMU measurement sequence is X i The reconstructed sequence is measured as X j The PMU measurement sequence is combined into X ij The X is ij Calculated by the following formula:
as a further scheme of the invention, the processing the two-dimensional image data by utilizing the hybrid clustering to obtain the identification result of each point of the initial PMU measurement sequence comprises the following steps:
and sequentially processing the two-dimensional image data by using a linear regression identifier, a DBSCAN identifier and a Gaussian mixture model (Gaussian mixture models, GMM) identifier to obtain the identification result of each point of the initial PMU measurement sequence.
As a further scheme of the invention, the integrated learning identification and result correction are carried out on the identification results of each point of the preliminary PMU measurement sequence to obtain the final PMU measurement sequence identification results, which comprises the following steps:
performing integrated learning identification on identification results of each point of the initial PMU measurement sequence to obtain a vote result;
and correcting the result of the ticket to obtain the final identification result of each point of the PMU measurement sequence.
As a further aspect of the present invention, the ensemble learning recognition is a ensemble method using majority votes as a base recognizer.
In a second aspect, in yet another embodiment of the present invention, a smart power distribution network PMU bad data identification system is provided, the system including: the system comprises a measurement reconstruction sequence acquisition module, a construction sequence combination module, a preliminary identification module and a final identification module;
The measurement reconstruction sequence acquisition module is used for acquiring a PMU measurement sequence and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;
the construction sequence combination module is used for forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence and generating corresponding two-dimensional image data.
The primary identification module is used for processing the two-dimensional image data by utilizing hybrid clustering to obtain the identification result of each point of the primary PMU measurement sequence;
and the final identification module is used for carrying out integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the identification results of each point of the final PMU measurement sequence.
In a third aspect, in yet another embodiment provided by the present invention, an apparatus is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements steps of a smart distribution network PMU bad data identification method when the computer program is loaded and executed.
In a fourth aspect, in yet another embodiment of the present invention, a storage medium is provided, storing a computer program, where the computer program when loaded and executed by a processor implements the steps of the smart power distribution network PMU bad data identification method.
The technical scheme provided by the invention has the following beneficial effects:
the invention provides a method, a system, equipment and a storage medium for identifying PMU bad data of an intelligent power distribution network, which are used for acquiring a PMU measurement sequence and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence; forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data; processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result; performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the identification results of each point of the final PMU measurement sequence; the method provided by the invention has the advantages that the PMU measurement sequence and the reconstruction sequence form coordinates, a two-dimensional scatter diagram is drawn, the space-time correlation analysis efficiency of the PMU measurement data time sequence and the reconstruction sequence is improved by the two-dimensional image-based method, and the difference characteristics of normal data and bad data and the reconstruction data are distinguished. And adopting hybrid clustering to divide normal data and bad data, and adopting an integrated learning mode to further correct the identification result. Graph-based methods recognize the characteristics of normal and bad data and exploit the potential of dimensional clustering and classifiers. So that the space-time correlation analysis efficiency and the identification sensitivity are improved.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying PMU bad data of an intelligent power distribution network according to an embodiment of the invention.
Fig. 2 is a flowchart of a method for identifying PMU bad data of an intelligent power distribution network according to an embodiment of the invention.
Fig. 3 is a diagram of different types of PMU anomalies.
Fig. 4 is a network structure diagram of the GAN model.
FIG. 5 shows a PMU measurement sequence and a measurement reconstruction sequence.
FIG. 6 is a scattergram of PMU measurement sequences and measurement reconstruction sequences.
Fig. 7 is a security area modification diagram.
FIG. 8 is a plot of normal data (correlation coefficient: 0.9669) PMU measurements under typical conditions.
FIG. 9 is a plot of step event data and outlier (correlation coefficient: 0.9665) PMU measurements under typical conditions.
FIG. 10 is a plot of spike event data and outlier (correlation coefficient: 0.9435) PMU measurements under typical conditions.
FIG. 11 is a plot of an outlier (correlation coefficient: 0.6527) PMU measurement under typical conditions.
FIG. 12 is a flow chart of bad data identification, results and effects of normal data under typical conditions.
FIG. 13 is a flow, results and effects of bad data identification containing step event data and outliers under typical conditions.
FIG. 14 is a flow, results and effects of bad data identification containing spike event data and outliers under typical conditions.
FIG. 15 shows the identification process, result and effect of bad data containing outliers under typical conditions.
Fig. 16 is a block diagram of a smart power distribution network PMU bad data identification system according to an embodiment of the present invention.
In the figure: the system comprises a measurement reconstruction sequence acquisition module-100, a construction sequence combination module-200, a preliminary identification module-300 and a final identification module-400.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Where the PMU sequentially takes measurements from a single bus at sampling intervals of Δt. During the time window of length T for observation, a time series of PMU measurements of length n (=t/Δt) are captured as the basic analysis unit for poor PMU data identification. The sampling time series consists of several electrical quantities, such as amplitude and angle of voltage and current, active and reactive power injection. Further analysis of the identification of bad data is performed over the PMU voltage amplitude time sequence, expressed as
x i =[x i,1 ,x i,2 ,…,x i,n ] (1)
Where i represents the bus number of the PMU deployment.
In addition to normal data, abnormal data obtained by PMU measurements also include missing data, outliers, and event data, where outliers and missing data are poor data due to poor measurement quality. The purpose of bad data identification is to classify outliers and missing data as bad data, while event data is classified as normal data by accurate measurement.
Missing data is typically caused by missing faults in the data measurement, transmission and storage processes, which results in values of zero, null or NaN (not a number) for corresponding locations in the historical data set. When the data falls, the lost data is easily found in the visual map of the PMU time series and is easily eliminated by accurate tag matching. Therefore, this section rarely focuses on the identification of missing data.
Fig. 3 shows an illustrative example of a single bus PMU measurement time series with a 5 second window length, which reveals different types of PMU anomaly data, including missing data, outliers, and event data. The PMU device samples 50 data points per second, at 20 millisecond intervals. Undoubtedly, a sharp data drop of 4 seconds is identified as lost data by its flag "0". Based on its unique graphical features, a step change in PMU measurements at about 4.60 seconds can be identified as step event data. However, data spikes at about 0.76 seconds, 1.44 seconds, 2.26 seconds, 3.24 seconds are confused by similar shapes and cannot be identified as outlier or spike event data. Since the system events are not known, more information is needed as a criterion for identifying spike-like data.
In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, fig. 1 is a flowchart of a method for identifying PMU bad data of an intelligent power distribution network according to an embodiment of the invention, and as shown in fig. 1, the method for identifying PMU bad data of an intelligent power distribution network includes steps S10 to S40.
S10, acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence. Wherein the PMU measurement sequence is V i The reconstruction sequence was measured to be V j . The PMU measurement sequence is a node voltage PMU measurement data time sequence.
In an embodiment of the present invention, before the step of obtaining the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence, the method further includes the steps of:
acquiring a sample data set, and training a GAN model by using the sample data set to obtain a trained GAN model; wherein the sample data set includes a sample PMU measurement sequence combination and corresponding sample two-dimensional image data.
In an embodiment of the present invention, the sample PMU measurement sequence combination includes a PMU measurement sequence and a sample measurement reconstruction sequence.
In the embodiment of the invention, the sample PMU measurement sequence set is voltage measurement data of all the nodes containing PMU measurement of the smart grid system at the same time.
In an embodiment of the present invention, the sample measurement reconstruction sequence set is historical time measurement data.
The obtaining the PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on the PMU measurement sequence includes:
and acquiring a PMU measurement sequence, and generating a corresponding measurement reconstruction sequence by utilizing a trained GAN model based on the PMU measurement sequence.
The GAN model includes a generation model (G) and a discrimination model (D), and the network structure thereof is shown in fig. 4.
Wherein the generative model (G) attempts to generate samples having the same probability distribution as the real data samples. The discrimination model (D) judges whether the input sample is a true sample or not according to the binary classifier. In the training process, the generating capacity of the generating model (G) and the resolving power of the judging model (D) are improved. Finally, the discrimination model (D) does not distinguish between the generated data samples and the real data samples, and the game between the discrimination model (D) and the generation model (G) achieves dynamic Nash equilibrium. The training process of GAN is to solve the following maximum and minimum binary games:
wherein x represents a real data sample; z represents random noise of the generated model; g (z) represents a generated sample taking into account random noise z; d (x) and D (G (z)) represent probabilities of identifying a real sample and generating a sample, respectively; p is p d (x) And p z (z) probability distributions of real samples and random noise, respectively; v (D, G) is a cost function that measures the difference between the probability distribution of the generated samples and the true samples. Parameters of D and G, denoted as θ D And theta G Updating according to a gradient descent method, and a random gradient g corresponding to the gradient descent method and the update θD And g θG The values are obtained according to (1-3) and (1-4), respectively.
It can be determined that the single node voltage PMU measurement sequence and the dynamic voltage distribution of its measurement reconstruction sequence enhance the potential to mine the spatio-temporal correlation of local bus lines to identify bad data. As shown in FIG. 5, after filling the data points by linear interpolation, the voltage magnitude curve is represented as a PMU measurement sequence V i . The voltage amplitude curve of the simultaneous segment of the measurement reconstruction sequence is denoted as V j And V (V) i Is 0.9408. Based on the difference analysis, V i Data spikes at 1.44s, 2.26s, 3.24s are likely outliers, because of V i On which a sudden change in the voltage amplitude occurs, V j Presentation ofA smooth trend is seen. V (V) j The same is true for the data peak at 3 s. However, V at 0.76 seconds i And V j Preferably identified as event data, subject to similar conditions of abrupt change trend according to both contours. In reality, comprehensive analysis of graph and data trend is not standard, so that manual judgment errors are caused. Therefore, further research is required for qualitative analysis of bad data identification.
S20, forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data.
In an embodiment of the present invention, the forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data includes:
and taking a PMU measurement sequence of a single node and a measurement reconstruction sequence thereof to form coordinates, taking the coordinates as input of a bad data identification program, and generating two-dimensional image data according to the coordinates.
The PMU measurement sequence is X i The reconstructed sequence is measured as X j
The PMU measurement sequence is combined into X ij The X is ij Calculated by the following formula:
combination X of PMU measurement sequences ij Generating two-dimensional coordinates by the following formula:
wherein,each of the coordinates corresponds to a pair of PMU measurements of the bus i, j during the same time period. In addition, +.>A scatter plot of coordinates to show the spatiotemporal distribution and correlation.
The scatter diagram of the illustrative example in fig. 6, in which the middle data cluster outlined by the dotted line is normal data, the two-sided data cluster is event data, and the scatter outside the dotted line is outlier.
As shown in FIG. 6, most of the data points are densely distributed on the diagonal due to the strong correlation of the two selected PMU measurement sequences. In order to tolerate normal measured systematic errors and variations in the system operating conditions, the "safe zone" for rejecting bad data is designed as a diagonal stripe, visible in the region near the diagonal. The normal data and event data are expected to be in the safe zone because paired PMU measurements during normal and event periods fit into the strong correlation feature. Due to severe deviation of measured data, larger V i And a smaller V j Is expected to sparsely locate the lower triangle region, with a smaller V i And a larger V j Is expected to lie sparsely in the upper triangle region.
The high resolution advantage of PMUs results in an aggregated distribution of measurement data from the same type, which is fully exploited in the proposed method. Furthermore, it should be noted that if two PMU measurements encounter outliers at the same time, the corresponding data points may be located in a safe zone. However, since the occurrence of bad data on a single PMU is a small probability event, the probability of this is very small and is ignored in this section.
The same type of data clustering and partitioning of the safety region is not suitable for manual completion in practice, and thus clustering and classifiers, i.e., the underlying poor PMU data identifiers, are used in this approach. The idea of bad data recognition based on two-dimensional spatiotemporal correlation analysis of scatter plots measured by PMU achieves the potential of currently used clusters and classifiers because their original application objects are two-dimensional images. Unlike conventional analysis, this method is effective in space-time analysis.
S30, processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result.
In the embodiment of the present invention, the processing the two-dimensional image data by using hybrid clustering to obtain a preliminary PMU measurement sequence point identification result includes:
and sequentially processing the two-dimensional image data by using a linear regression identifier, a DBSCAN identifier and a GMM identifier to obtain the identification result of each point of the initial PMU measurement sequence.
The invention utilizes the proposed graph-based method to identify the characteristics of normal and bad data and to exploit the potential of dimensional clustering and classifiers. Therefore, the space-time correlation analysis efficiency and the recognition sensitivity are improved.
The linear regression identifier (LR) minimizes the sum of squares of residuals between the observed target y and the target predicted by linear approximation within the dataset. In a two-dimensional model, the regression line is considered to be
Where y and x represent the approximated target vector and input vector, respectively, and a and b represent the slope and intercept of the regression line, respectively. To solve the classification problem, x i And x j Respectively as an object and an input in a linear model, rewritten as
In order to measure the error between the approximate target and the observed target, the standard deviation σ is calculated as follows
The 3σ principle is applied to tolerate measurement errors and avoid erroneous decisions on normal data. The safe area of fig. 6 can then be modeled as:
Regression lines and σ are calculated by the linear regression identifier within each preset time window of the PMU measurements. The purpose of the linear regression identifier is to classify diagonally distributed normal data that contains strongly correlated event data. Data points that are outside the safe-zone range constraint are then identified as outliers. Linear regression is a regression-based identifier, so the uncertainty of the outlier deviation results in a change in the optimal linear partitioning. Thus, the safety zone divided by the 3σ principle is likely to contain some bad data with relatively small deviations and exclude very few normal data that deviate from the regression line in extreme system operating conditions. Misinterpretation can reduce the accuracy of the linear regression identifier.
In the present invention, the DBSCAN identifier views the clusters as high density regions separated by low density regions, so the clusters found by the DBSCAN may be any shape. The DBSCAN identifier finds a high density of core samples and expands the clusters from it. Thus, a cluster is a set of core samples and a set of non-core samples that are close to the core samples.
There are two parameters in the DBSCAN algorithm to define the data density, min_samples and eps. Formally, a core sample is defined as the sample, i.e., neighbor, where there is the smallest sample in the dataset within a distance per minute. A cluster is a set of core samples that can be constructed by recursively taking one core sample, finding neighbors of all its core samples, and so on. A cluster also has a set of non-core samples that are neighbors of the core samples, but at the edges of the cluster. Therefore, non-core samples at a distance of at least 1/4 from any core sample are considered outliers by the algorithm. While the parameter min_samples mainly controls the tolerance of the algorithm to noise, the parameter eps is critical to the proper selection of the data set and distance function.
DBSCAN identifier and PMU measurement sequence combination X ij Is performed by clustering to identify densely distributed data points as normal data. Then, clusters with small data volume and data points with long distance are selected and classified as the data points with long distanceGroup points.
The DBSCAN identifier is a density-based identifier, and thus uncertainty in the system operating state results in a change in the data distribution density. Both parameters of the DBSCAN identifier are manually set, which may lead to erroneous decisions when poor PMU data is identified. Thus, clusters aggregated by the DBSCAN algorithm may exclude small amounts of sparsely distributed normal data, such as event data measured during transients of sudden system changes.
The gaussian mixture model algorithm is a variant of the probability k-Means method. The GMM-based recognizer assumes that the data follows a gaussian mixture distribution, in other words, the data may be considered to be generated from k gaussian distributions. Each gaussian distribution is called a component, and the linear superposition of these components forms the probability density function p (z) of the GMM:
where z is the data sample, ω, of dataset D l Represents the probability that the first component produces x, g (z|mu ll ) Is a probability density function of the first component, where μ l Sum sigma l Is the center of the first gaussian distribution and the covariance matrix.
The kth gaussian distribution corresponds to k clusters, so the GMM-based clusters are essentially ω l 、μ l 、σ l Likelihood function as an evaluation function E in GMM parameter estimation:
wherein n is s Is the sample size of the dataset and F is the logarithmic form of E for ease of computation. When E reaches a maximum called maximum likelihood, it is obtainedA pair of GMM parameters that determine a probability distribution that maximizes the probability of producing a data sample in D. In order to maximize the evaluation function E, ω is initialized l 、μ l 、σ l Thereafter, an iteration-based approach similar to k-Means is employed as follows:
1) Estimating z m Belonging to cluster C l Is defined by z m First gaussian distribution generation:
2) Generating p (z) from the conclusion in 1), i.e. the first gaussian distribution 1 ∈C l )z 1 ,p(z 2 ∈C l )z 2 ,…,p(z n ∈C l )z n By estimating the first gaussian distribution parameter:
convergence is considered to be achieved when the calculated parameters remain unchanged in the iteration.
In an embodiment of the invention, the purpose of the GMM recognizer is to identify the primary cluster of normal data with a relatively loose margin of error. To solve the bad data recognition problem proposed by GMM, the total number of clusters is set to 2 to better divide the normal data and bad data into two clusters. It is believed that the poor data rate in the PMU measurement time series is relatively low. Thus, if the data amounts of two clusters are significantly different, data points in the clusters with high data amounts are classified as normal data, and data points in the clusters with low data amounts are classified as bad data. However, if the lower cluster data volume still occupies a significant portion of the total data volume, then the data points in both clusters are classified as normal data, meaning that the GMM recognizer cannot classify bad data classes.
In the embodiment of the invention, the identification method of the GMM identifier is based on a model, and the total number of clusters of the GMM identifier is also a super parameter which needs to be preset. Because of the relatively loose tolerance characteristics, the GMM identifier may mine out outliers that are difficult for other identifiers to identify, and the GMM identifier may introduce some outliers into the normal data cluster, thereby causing erroneous decisions.
S40, performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence.
In the embodiment of the present invention, performing integrated learning identification and result correction on the identification results of each point of the preliminary PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence, including:
performing integrated learning identification on identification results of each point of the initial PMU measurement sequence to obtain a vote result;
and correcting the result of the ticket to obtain the final identification result of each point of the PMU measurement sequence.
In an embodiment of the invention, the ensemble learning recognition is a ensemble method using majority votes as a base recognizer.
The invention uses majority voting as the gathering method of the basic identifier, and verifies the normal data through the consistent voting. And carrying out voting condition analysis on the statistical data, and verifying the identifiable normal data. Then, the disputed recognition result in the majority vote is corrected according to the correction dividing limit of the normal and bad data distribution determined by the normal data verified previously. The proposed method is unsupervised due to the pictorial analysis; thus, it is suitable for on-line poor PMU data identification. Simulation results show that compared with a single identifier, the method has more excellent performance and can obtain accurate results under various conditions in a shorter calculation time.
The invention constructs the PMU measuring sequence and the measuring reconstruction sequence into coordinates, draws a two-dimensional scatter diagram, improves the space-time correlation analysis efficiency of the two PMU measuring sequences based on the two-dimensional image method, and particularly distinguishes the characteristics of normal data and bad data. And then adopting a plurality of clustering/classifying devices based on a space model to carry out mixed clustering on the two-dimensional scatter diagram so as to divide normal and bad data, and adopting an integrated learning mode to further correct the identification result. The method and the device can effectively solve the problem of identifying the poor data of the PMU measurement data in the power grid and improve the accuracy of identifying the poor data.
Specifically, in some cases, each identifier remains limited, and thus a single identifier cannot identify the complete type of poor PMU data. Therefore, the comprehensive judgment of bad data by using various identifiers is of great significance. The mixed bad data identification method is hopeful to classify the whole PMU measurement, and improves the accuracy of bad data identification. The method is completely unsupervised and can be simultaneously applied to on-line bad PMU data identification of two measurement time sequences.
In an embodiment of the invention, to deal with the binary classification problem, the main idea of the voting classifier is to combine conceptually different classifiers and classify class labels using Majority Votes (MVs). Such classifiers are useful for a set of well-behaved models in order to balance their respective weaknesses. In majority voting, the class labels of a particular sample are class labels that represent the majority of class labels classified by each individual classifier.
For poor PMU data identification, the class of data points is determined by most of the identification results of the substantially poor PMU data identifier based on the majority vote. If a data point is identified by all identifiers as normal/bad data, the consistent result is identified as the final output label of the normal/bad data. However, if a different result of a data point is obtained from the base identifier, this data point is likely to be related to the finite condition of some identifiers. Further analysis using DBSCAN is required, as simply using majority voting may introduce false positives. Note that if the base identifier remains consistent for all PMU data points within the sampling time window, no further action will be taken in order to ease the computational burden and duration.
In the embodiment of the invention, the majority vote is taken as a first stage, and poor PMU data identification result verification of a two-stage structure is constructed, wherein the second stage aims to correct the result of the data point with disputes in the majority vote. Based on the graphical analysis and laboratory verification data, statistics of the complete majority vote conditions (sample size 12000) are listed in table 1, with the time of occurrence of each condition listed in columns 4 and 5. Based on the statistics, the possible results for the relevant conditions are listed in column 6. As shown in conditions 6 and 7, the final result may be different from the majority vote result, requiring further modification.
Table 1 voting cases
Note that: n: normal data; b: poor data; s: rarely occur
Based on the statistics in table 1, the main problem in mitigating most ticket misinterpretations is the correct classification of conditions 5, 6 and 7. Furthermore, it is highly possible to assume that data points meeting conditions 1, 2, 3, 4, 5, and 8 are correctly classified. Then, in the case of obtaining available information of a large amount of normal data, we intend to modify the security zone determined by linear regression and enhance it as a "security zone".
First, the normal data identified by the conditions 1, 2, 3, and 4 in the previous process are sorted out. With relaxed parameters, a database scan is performed to find several clusters that can fully contain previously recognized normal data. These data serve as a basis for modifying the edges of the security area and describe the normal data layout in more detail. As shown in fig. 7, the previously recognized normal data is aggregated by two clusters.
Second, the security area around each cluster is updated according to the data distribution. For example, the slope obtained by linear regression is a 1 Is a line of (2)And l 1 The upper and lower limits of the safe area around cluster 1 are determined. Line->And l 1 The analytical expression of (2) is determined by the following formula
Where tol is the error margin coefficient with a relatively small positive value, C' 1 Is x i X of data points in the original index set or cluster 1 of (2) j . As are 2l and 2 l.l 1And/l 2 Is also determined by the normal data distribution of each cluster, by a slope of-1/a 1 Vertical line l 'of (2)' 1 And l' 2 Measurement of
Third, vertical line l' 1 ,l' 2 And linel 1l 2 Four intersection points are generated which serve as turning points for the upper and lower limits of the modified safe zone, as shown in fig. 7. Two straight lines +.>And/l 12 Connect the intersections and complete the upper and lower bounds. Line->And/l 12 Is determined by four intersection coordinates.
Finally, to correct the bad data identification of the data points in the majority vote, a second stage verification of the bad data identification relies on a modified variable width security zone. At the upper limitAnd lower limit (l) 1 -l 12 -l 2 ) Data points in between are classified as normal data, while those outside the range are classified as bad data. Furthermore, the data points of fitting case 6 are identified by the linear regression and GMM identifier as bad data, meaning that these data points are far from the regression line and do not belong to the main normal data cluster. The false positive results of normal data from the database scan identifier may be caused by the coincidence that these bad data points lie closely in the scatter plot and are therefore identified by the database scan as clusters of data. Based on the above characteristics, the data points that fit case 6 should be classified as bad data, which fits the result of the majority vote.
In the case study of the invention, an open access PMU dataset from an EPFL (Switzerland) campus was used to validate the proposed bad data identification method. The time window for each PMU measurement sample is 60 seconds, including 3000 data points. The test platform is Python 3.7 based on Intel i7-9700@3.00GHz CPU and 32GB RAM.
Exemplary Using the methods of the inventionFour typical conditions were selected, including "normal data", "ladder event data and outliers", "spike event data and outliers" and "outliers" for the case study. FIGS. 8-11 show the voltage magnitude V under typical conditions 2 And V 3 Is a PMU measurement curve. In the form of a two-dimensional scatter plot, wherein the x-axis represents V 2 The y-axis represents V 3 Fig. 12-15 illustrate the raw clustering and bad data recognition results in a typical case.
The original clustering results for each method are shown in the first row of the graph, with the computation time for each step listed in the bottom right. In FIG. 12-15{1,1} (row 1, column 1), the scatter points represent raw samples of PMU data and the polyline represents the upper and lower limits calculated by linear regression. . The broken lines in FIG. 12-15{1,5} represent the upper and lower limits of the modified security zone.
The second row of fig. 12-15 shows the bad data recognition results and effects of the respective methods, the circular dots represent the correct recognition results (normal data recognized as normal data or bad data recognized as bad data), and the triangular dots (normal data recognized as bad data) and the square dots (bad data recognized as normal data) represent the wrong results, respectively. The recognition accuracy of each method is also shown by definition in the lower right.
Acc=(n all -n fn -n fb )/n all ×100% (22)
Wherein n is all Is the sample size of the data points in the test case, n fn Is the number of normal data points that are misidentified, n fb Is the number of bad data points that are misidentified. The index Acc (accuracy) reflects the ratio of the number of correctly identified samples to the total number of samples.
(1) Normal data
The PMU measurement sequence and its reconstruction sequence are strongly correlated with a correlation coefficient of 0.9669. According to fig. 8 and 12{1,1}, the features of the relevant normal data include two-dimensional scatter plots of similar contours and diagonal distributions. The linear regression identifier misjudges some normal data with relatively large deviation, while the DBSCAN identifier misjudges some normal data with sparse distribution. Both clusters of the GMM recognizer have a large amount of data and are merged into one cluster as normal data. However, the MV method corrects the recognition errors of {2,1} and {2,2} of FIG. 12, and 100% accuracy is obtained after the result correction.
(2) Step event data and outliers
According to fig. 9 and 13{1,1}, the normal PMU measurement data points after and before the staircase event are still strongly correlated and are divided into two clusters. In fig. 13{2,2}, the DBSCAN identifier misjudges some scattered normal data in the transient process, and two clusters of GMM have large data amount and are combined into one cluster as normal data, so that the outlier in fig. 13{2,3} is misjudged. In fig. 13{2,4}, the erroneous judgment is corrected by MV. After the result correction, the MV accuracy remains at 100%.
(3) Spike event data and outliers
According to FIGS. 10 and 14{1,1}, the normal PMU measurement data points within a spike event are strongly correlated. In fig. 14{2,2}, the DBSCAN identifier misjudges some scattered normal data in the transient, whereas in fig. 14{2,3}, a small amount of normal data in the transient causes misjudgment of the GMM. As shown in FIG. 14{2,4}, the MVs correct some recognition errors, since most recognizers judge the error situation for some normal data. However, as shown in {2,5} of fig. 14, the data points in MV that are misjudged are located in the corrected safe area during the second-stage correction, and the recognition result is corrected, so that 100% accuracy is obtained after the result correction.
(4) Outlier value
The presence of PMU measurement data outliers from spatially adjacent buses reduces the correlation coefficient to 0.6527. According to FIGS. 11 and 15{1,1}, the outliers of the PMU measurement data points deviate significantly from the diagonal. The DBSCAN and GMM identifier can correctly distinguish normal data from bad data. The linear regression identifier misjudges some normal data with relatively large deviation in fig. 15{2,1} and corrects the normal data by MV as shown in fig. 15{2,4 }. The corrected results maintained 100% accuracy.
Based on the above analysis, each single identifier has limitations in some cases, which results in erroneous judgment and low accuracy. The introduction of MV enhances the generalization capability of the hybrid clustering, thereby improving the accuracy of bad data identification. After the result correction program is introduced, the method overcomes the limitation of the MV and makes up the error of the MV in the cost calculation time of less than one second, thereby obtaining the most accurate result.
In order to verify the superiority and high performance of the proposed method in online bad data identification, a comprehensive study was performed, the results of which were evaluated by an index. The index Fal (error resolution) reflects the ratio of the number of normal data samples to the total number of samples of the error recognition, the index Mis (missing resolution) reflects the ratio of the number of bad data samples to the total number of samples of the error recognition, and the index Pre (precision) reflects the ratio of the number of bad data samples to the number of bad data samples of the correct recognition. Performance testing was performed with six sets of one-hour 18000 PMU data convection (36000 PMU data) in an open PMU dataset, with different bad data rates and bias ranges. The proposed method is performed over a moving time window of PMU data of one minute in length (3000 pairs, 6000 data). Notably, the bad data ratio determines the proportion of the amount of bad data in a single PMU time series. Thus, when two-dimensional analysis is involved, the actual bad data ratio refers to the total bad data count in both PMU time series divided by twice the length of the single time series.
Fal=n fn /n all ×100% (23)
Mis=n fb /n all ×100% (24)
Pre=n tb /(n tb +n fb )×100% (25)
Wherein: n is n tn Is the number of correctly identified normal data points, n tb Is the number of bad data points that are correctly identified.
TABLE 2 comprehensive results of bad data identification (EPFL data)
Note that: the coarse font method is based on the two-dimensional analysis proposed in this section, while the fine font method is based on one-dimensional analysis. The list "Proposed" is the test result corresponding to the method of the project.
Table 2 lists the numerical test results of the on-line bad data recognition effect. Due to the inapplicability of linear regression and GMM in one-dimensional analysis, only a performance comparison between two-dimensional and traditional one-dimensional DBCSCAN methods is demonstrated. According to the improvement of the bad data recognition accuracy/precision and the reduction of the bad data recognition omission rate/error rate in table 2, the two-dimensional DBSCAN method is superior to the one-dimensional method because of the density-based features in the scatter diagram. Therefore, the two-dimensional method improves the identification performance of the bad data by analyzing the time-space correlation of different measured values.
Furthermore, the performance between the single model-based method (LR/DBSCAN/GMM), the set-based Method (MV) and the proposed method, which are all based on two-dimensional analysis, was compared. Although the performance of the linear regression recognizer and the GMM recognizer is worse than the DBSCAN recognizer, their role in finding the error and missing recognition points of the DBSCAN recognizer is verified by the performance of proposed hybrid methods. The set-based MV approach performs generally, but better than most single-base recognizers. In the test of six data sets, the proposed two-stage structure hybrid method has the highest bad data identification accuracy, the lowest miss rate and the low error rate. The superior performance of the proposed method verifies the effectiveness of the proposed bad data recognition result verification and correction process by reducing false and missing recognition. The improvement of the recognition accuracy of the proposed method means that the proposed method can recognize certain situations that cannot be recognized by other methods and can adapt to more complex bad data conditions. With the increase of the bad data ratio and the decrease of the maximum deviation range, the bad data identification accuracy of the proposed method is also reduced due to the increase of the bad identification difficulty. However, due to the high generalization ability and low sensitivity, the proposed method still has stable performance, with poor data recognition accuracy higher than 99.9% under the definition of Pre index.
The average calculation time for identifying the on-line bad data of the PMU time sequence in a single time window is 0.1612s, which is far smaller than the time window length, thereby meeting the requirement of on-line identification.
The project also tests the bad data identification effect of the method for the actual measurement data of the PMU of the domestic power distribution network, and the data result is shown in Table 3.
TABLE 3 comprehensive results of bad data identification (data of demonstration area of Lingang)
Test results show that the identification accuracy of the bad data can be higher than 99.9% under the definition of Pre index aiming at the PMU actual measurement data of the harbor demonstration area.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In one embodiment, referring to fig. 3, a system for identifying PMU bad data of a smart distribution network is further provided in an embodiment of the present invention, where the system includes a measurement reconstruction sequence acquisition module 100, a construction sequence combination module 200, a preliminary identification module 300, and a final identification module 400.
The measurement reconstruction sequence obtaining module 100 is configured to obtain a PMU measurement sequence, and obtain a corresponding measurement reconstruction sequence based on the PMU measurement sequence.
The building sequence combination module 200 is configured to form a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generate corresponding two-dimensional image data.
The preliminary identification module 300 is configured to process the two-dimensional image data by using hybrid clustering to obtain a preliminary PMU measurement sequence identification result of each point.
The final recognition module 400 is configured to perform integrated learning recognition and result correction on the recognition results of each point of the preliminary PMU measurement sequence, so as to obtain the recognition results of each point of the final PMU measurement sequence.
In one embodiment, referring to fig. 5, there is also provided in an embodiment of the present invention an apparatus comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other via the communication bus.
A memory for storing a computer program;
and the processor is used for executing the method for identifying the bad data of the PMU of the intelligent power distribution network when executing the computer program stored in the memory, and the steps in the embodiment of the method are realized when the processor executes the instruction.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral ComponentInterconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The device comprises user equipment and network equipment. Wherein the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group of multiple network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of computers or network servers, where Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computer sets. The device can operate alone to realize the invention, and can also access the network and realize the invention through interaction with other devices in the network. Wherein the network where the device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In one embodiment of the invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. A method for identifying bad data of a PMU of an intelligent power distribution network is characterized by comprising the following steps:
acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;
forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data;
processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result;
and performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence.
2. The method for identifying PMU fault data of an intelligent power distribution network according to claim 1, wherein before obtaining the PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on the PMU measurement sequence, the method further comprises the steps of:
acquiring a sample data set, and training a GAN model by using the sample data set to obtain a trained GAN model; wherein the sample data set includes a sample PMU measurement sequence combination and corresponding sample two-dimensional image data.
3. The method for identifying PMU fault data of an intelligent power distribution network according to claim 2, wherein said obtaining a PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on said PMU measurement sequence comprises:
And acquiring a PMU measurement sequence, and generating a corresponding measurement reconstruction sequence by utilizing a trained GAN model based on the PMU measurement sequence.
4. The method for identifying PMU fault data of intelligent power distribution network according to claim 2, wherein said PMU measurement sequence is X i The reconstructed sequence is measured as X j, The PMU measurement sequence is combined into X ij The X is ij Calculated by the following formula:
5. the method for identifying PMU failure data of an intelligent power distribution network according to claim 1, wherein said processing said two-dimensional image data by using hybrid clustering to obtain an identification result of each point of a preliminary PMU measurement sequence comprises:
and sequentially processing the two-dimensional image data by using a linear regression identifier, a DBSCAN identifier and a Gaussian mixture model identifier to obtain the identification result of each point of the initial PMU measurement sequence.
6. The method for identifying PMU fault data of intelligent power distribution network according to claim 1, wherein performing integrated learning identification and result correction on each point identification result of the preliminary PMU measurement sequence to obtain a final PMU measurement sequence each point identification result comprises:
performing integrated learning identification on identification results of each point of the initial PMU measurement sequence to obtain a vote result;
And correcting the result of the ticket to obtain the final identification result of each point of the PMU measurement sequence.
7. The method for identifying PMU failure data of a smart distribution network according to claim 6, wherein said ensemble learning identification is a ensemble method using majority vote as a base identifier.
8. Intelligent power distribution network PMU bad data identification system, its characterized in that, this system includes: the system comprises a measurement reconstruction sequence acquisition module, a construction sequence combination module, a preliminary identification module and a final identification module;
the measurement reconstruction sequence acquisition module is used for acquiring a PMU measurement sequence and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;
the construction sequence combination module is used for forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence and generating corresponding two-dimensional image data;
the primary identification module is used for processing the two-dimensional image data by utilizing hybrid clustering to obtain the identification result of each point of the primary PMU measurement sequence;
and the final identification module is used for carrying out integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the identification results of each point of the final PMU measurement sequence.
9. An apparatus comprising a memory storing a computer program and a processor implementing the steps of the smart distribution network PMU bad data identification method according to any one of claims 1-7 when the computer program is loaded and executed by the processor.
10. A storage medium storing a computer program which, when loaded and executed by a processor, implements the steps of the smart distribution network PMU bad data identification method according to any one of claims 1-7.
CN202310114842.9A 2023-02-15 2023-02-15 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network Pending CN117056714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310114842.9A CN117056714A (en) 2023-02-15 2023-02-15 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310114842.9A CN117056714A (en) 2023-02-15 2023-02-15 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Publications (1)

Publication Number Publication Date
CN117056714A true CN117056714A (en) 2023-11-14

Family

ID=88659586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310114842.9A Pending CN117056714A (en) 2023-02-15 2023-02-15 Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Country Status (1)

Country Link
CN (1) CN117056714A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210285994A1 (en) * 2016-08-05 2021-09-16 The Regents Of The University Of California Phase identification in power distribution systems
CN114330486A (en) * 2021-11-18 2022-04-12 河海大学 Power system bad data identification method based on improved Wasserstein GAN
CN114510469A (en) * 2022-02-08 2022-05-17 中国电力科学研究院有限公司 Method, device, equipment and medium for identifying bad data of power system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210285994A1 (en) * 2016-08-05 2021-09-16 The Regents Of The University Of California Phase identification in power distribution systems
CN114330486A (en) * 2021-11-18 2022-04-12 河海大学 Power system bad data identification method based on improved Wasserstein GAN
CN114510469A (en) * 2022-02-08 2022-05-17 中国电力科学研究院有限公司 Method, device, equipment and medium for identifying bad data of power system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUANGZENG YOU, ET.AL: "Correlation-based Bad Data Detection of PMU Measurements for Improving Network Structure with High Penetration of Renewable Energy", THE 5TH IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION, pages 1 - 6 *
MENGZE ZHOU, ET.AL: "Ensemble-Based Algorithm for Synchrophasor Data Anomaly Detection", IEEE TRANSACTIONS ON SMART GRID, vol. 10, no. 3, pages 1 - 10 *
YANMING ZHU, XIAOYUAN XU, ZHENG YAN: "Hybrid clustering-based bad data detection of PMU measurements", ENERGY CONVERSION AND ECONOMICS, 10 December 2021 (2021-12-10), pages 235 - 241 *

Similar Documents

Publication Publication Date Title
US11657322B2 (en) Method and system for scalable multi-task learning with convex clustering
CN110458230A (en) A kind of distribution transforming based on the fusion of more criterions is with adopting data exception discriminating method
CN114297036B (en) Data processing method, device, electronic equipment and readable storage medium
CN104198912B (en) A kind of hardware circuit FMEA based on data mining analyzes method
CN108717496B (en) Radar antenna array surface fault detection method and system
Mothe et al. Community detection: Comparison of state of the art algorithms
CN114048468A (en) Intrusion detection method, intrusion detection model training method, device and medium
Kim et al. A generalised uncertain decision tree for defect classification of multiple wafer maps
CN109189876A (en) A kind of data processing method and device
CN112419268A (en) Method, device, equipment and medium for detecting image defects of power transmission line
CN107908807B (en) Small subsample reliability evaluation method based on Bayesian theory
Zhu et al. Hybrid clustering‐based bad data detection of PMU measurements
Gkikopoulos et al. AVOC: history-aware data fusion for reliable IoT analytics
CN117056714A (en) Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network
CN107506824B (en) Method and device for detecting bad observation data of power distribution network
Bashir et al. Matlab-based graphical user interface for IOT sensor measurements subject to outlier
CN113255810A (en) Network model testing method based on key decision logic design test coverage rate
CN114140246A (en) Model training method, fraud transaction identification method, device and computer equipment
CN111542819B (en) Apparatus and method for an improved subsurface data processing system
Mercioni et al. Evaluating hierarchical and non-hierarchical grouping for develop a smart system
Kim et al. Trustworthy dynamic data awareness model for tracking in CPS
Abaei et al. Software fault prediction based on improved fuzzy clustering
Kandanaarachchi et al. Extreme Value Modelling of Feature Residuals for Anomaly Detection in Dynamic Graphs
CN112884167B (en) Multi-index anomaly detection method based on machine learning and application system thereof
Zheng et al. Siamese Multiple Attention Temporal Convolution Networks for Human Mobility Signature Identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination