CN117056714A

CN117056714A - Method, system, equipment and storage medium for identifying PMU bad data of intelligent power distribution network

Info

Publication number: CN117056714A
Application number: CN202310114842.9A
Authority: CN
Inventors: 严正; 谢伟; 徐潇源; 方陈; 朱彦名; 刘舒; 柳劲松
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2023-02-15
Filing date: 2023-02-15
Publication date: 2023-11-14

Abstract

The invention relates to the technical field of data processing, in particular to a method, a system, equipment and a storage medium for identifying bad PMU data of an intelligent power distribution network. The method comprises the following steps: acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence; forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data; processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result; and performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence. The method based on the chart identifies the characteristics of normal and bad data, exerts the potential of dimension clustering and classifier, and improves the space-time correlation analysis efficiency and the identification sensitivity.

Description

Intelligent distribution network PMU bad data identification method, system, device and storage medium

技术领域Technical Field

本发明涉及数据处理技术领域，尤其涉及智能配电网PMU不良数据辨识方法、系统、设备及存储介质。The present invention relates to the field of data processing technology, and in particular to a method, system, device and storage medium for identifying bad data of a PMU in a smart distribution network.

背景技术Background Art

混合聚类：指采用多种聚类方法对同一个数据集进行聚类操作。Hybrid clustering: refers to using multiple clustering methods to cluster the same data set.

智能配电网：智能配电网是智能电网的关键环节之一。通常10kV及以下的电力网络属于配电网络(部分区域有20kV)，配电网是整个电力系统与分散的用户直接相连的部分。智能配网系统是利用现代电子技术、通讯技术、计算机及网络技术，将配电网在线数据和离线数据、配电网数据和用户数据、电网结构和地理图形进行信息集成，实现配电系统正常运行及事故情况下的监测、保护、控制、用电和配电管理的智能化。Smart distribution network: Smart distribution network is one of the key links of smart grid. Usually, power networks of 10kV and below belong to the distribution network (some areas have 20kV). The distribution network is the part of the entire power system that is directly connected to the dispersed users. The smart distribution network system uses modern electronic technology, communication technology, computer and network technology to integrate the online and offline data of the distribution network, the distribution network data and user data, the grid structure and geographic graphics, and realize the intelligent monitoring, protection, control, power consumption and distribution management of the normal operation and accident of the distribution system.

PMU：同步相量测量装置(PMU：phasor measurement unit)是利用全球定位系统(GPS)秒脉冲作为同步时钟构成的相量测量单元。可用于电力系统的动态监测、系统保护和系统分析和预测等领域.是保障电网安全运行的重要设备。基于GPS时钟的PMU能够测量电力系统枢纽点的电压相位、电流相位等相量数据，通过通信网把数据传到监测主站.监测主站根据不同点的相位幅度。PMU: Synchronous phasor measurement unit (PMU: phasor measurement unit) is a phasor measurement unit that uses the global positioning system (GPS) second pulse as a synchronous clock. It can be used in the fields of dynamic monitoring, system protection, system analysis and prediction of power systems. It is an important device to ensure the safe operation of power grids. PMU based on GPS clock can measure phasor data such as voltage phase and current phase at the hub of the power system, and transmit the data to the monitoring master station through the communication network. The monitoring master station measures the phase amplitude at different points.

重构数据：指采用对抗生成网络等算法生成的与真实量测数据分布相同的模拟量测数据。Reconstructed data: refers to simulated measurement data generated by algorithms such as adversarial generative networks, which has the same distribution as the actual measurement data.

不良数据：除了正常数据之外，PMU测量获得的异常数据还包括缺失数据、异常值和事件数据，其中异常值和缺失数据是由于测量质量差而导致的不良数据。不良数据辨识的目的是将异常值和缺失数据归类为不良数据，而事件数据通过精确测量归类为正常数据。Bad data: In addition to normal data, abnormal data obtained by PMU measurement also includes missing data, outliers and event data, among which outliers and missing data are bad data caused by poor measurement quality. The purpose of bad data identification is to classify outliers and missing data as bad data, while event data is classified as normal data through accurate measurement.

对抗生成网络：简称GAN。GAN是一种深度生成模型，由判别模块和生成模块构成。在训练过程中，生成器G输入与目标数据同维度的高斯噪声，判别器D输入正常量测信息和生成器输出的伪数据，二者交替迭代训练形成博弈对抗，最终生成器和判别器达到纳什均衡，此时生成器输出重构量测数据。Generative Adversarial Network: GAN for short. GAN is a deep generative model consisting of a discriminant module and a generator module. During the training process, the generator G inputs Gaussian noise of the same dimension as the target data, and the discriminator D inputs normal measurement information and pseudo data output by the generator. The two are trained alternately and iteratively to form a game confrontation. Finally, the generator and the discriminator reach a Nash equilibrium, at which time the generator outputs the reconstructed measurement data.

PMU数据是电力系统监测、控制和分析的基础，因此，PMU数据质量至关重要，对分析结果乃至电力系统运行安全性的产生显著影响。然而，PMU设备构造复杂，容易受到内外部因素的影响，从而导致PMU测量的时间序列包含不良或异常数据。故有必要对电力系统中PMU测量数据进行不良数据辨识，并努力提高PMU测量数据的质量。PMU data is the basis for monitoring, control and analysis of power systems. Therefore, the quality of PMU data is crucial and has a significant impact on the analysis results and even the safety of power system operation. However, PMU equipment has a complex structure and is easily affected by internal and external factors, which causes the time series of PMU measurements to contain bad or abnormal data. Therefore, it is necessary to identify bad data in PMU measurement data in power systems and strive to improve the quality of PMU measurement data.

PMU数据的异常值是指与预期测量值有显著偏差的数据点。在没有系统事件的情况下，异常值通常形状像突然上升和下降的尖峰。事件数据是在系统事件(如切换事件和负载突然变化)发生时生成的。通常情况下，由于PMU测量值从事件前阶段变化到事件后阶段，因此会显示阶跃事件数据。如果PMU测量值在瞬变周期前后偏差很小，则事件数据呈尖峰状，PMU测量曲线具有类似的突变和异常值。基于检查突然变化发生的不良数据辨识方法可以很容易地识别异常值，如静态操作条件下的数据峰值。然而，异常值和尖峰状事件数据的相似轮廓会导致不良数据辨识的失败和不准确。Outliers in PMU data are data points that deviate significantly from the expected measurement. In the absence of system events, outliers are usually shaped like spikes that rise and fall suddenly. Event data is generated when system events such as switching events and sudden load changes occur. Typically, step event data is displayed because the PMU measurement value changes from the pre-event period to the post-event period. If the PMU measurement value has a small deviation before and after the transient period, the event data is spike-shaped, and the PMU measurement curve has similar sudden changes and outliers. Bad data identification methods based on checking the occurrence of sudden changes can easily identify outliers, such as data peaks under static operating conditions. However, the similar profiles of outliers and spike-shaped event data can lead to failure and inaccuracy in bad data identification.

发明内容Summary of the invention

为了解决上述现有技术中存在的技术问题，本发明提供了一种智能配电网PMU不良数据辨识方法、系统、设备及存储介质。In order to solve the technical problems existing in the above-mentioned prior art, the present invention provides a method, system, device and storage medium for identifying bad data of PMU in a smart distribution network.

为实现上述目的，本发明实施例提供了如下的技术方案：To achieve the above objectives, the embodiments of the present invention provide the following technical solutions:

第一方面，在本发明提供的一个实施例中，提供了智能配电网PMU不良数据辨识方法，该方法包括以下步骤：In a first aspect, in one embodiment provided by the present invention, a method for identifying bad data of a PMU in a smart distribution network is provided, the method comprising the following steps:

获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列；Acquire a PMU measurement sequence, and obtain a corresponding measurement reconstruction sequence based on the PMU measurement sequence;

基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据；Based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, a PMU measurement sequence combination is formed, and corresponding two-dimensional image data is generated;

利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果；Processing the two-dimensional image data by hybrid clustering to obtain preliminary identification results of each point in the PMU measurement sequence;

对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果。The preliminary identification results of each point in the PMU measurement sequence are subjected to integrated learning identification and result correction to obtain the final identification results of each point in the PMU measurement sequence.

作为本发明的进一步方案，在所述获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列，之前还包括步骤：As a further solution of the present invention, before obtaining the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence, the following steps are also included:

获取样本数据集，利用所述样本数据集对GAN模型进行训练，获得训练后的GAN模型；其中所述样本数据集包括样本PMU量测序列组合和对应的样本二维图像数据。A sample data set is obtained, and a GAN model is trained using the sample data set to obtain a trained GAN model; wherein the sample data set includes a sample PMU measurement sequence combination and corresponding sample two-dimensional image data.

作为本发明的进一步方案，所述获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列，包括：As a further solution of the present invention, the acquiring of the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence includes:

获取PMU量测序列，基于所述PMU量测序列利用训练后的GAN模型，生成对应的量测重构序列。A PMU measurement sequence is obtained, and a corresponding measurement reconstruction sequence is generated based on the PMU measurement sequence using a trained GAN model.

作为本发明的进一步方案，所述PMU量测序列为X_i，量测重构序列为X_j，所述PMU量测序列组合为X_ij，所述X_ij通过如下公式计算：As a further solution of the present invention, the PMU measurement sequence is _Xi , the measurement reconstruction sequence is _Xj , the PMU measurement sequence combination is _Xij , and _Xij is calculated by the following formula:

作为本发明的进一步方案，所述利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果，包括：As a further solution of the present invention, the two-dimensional image data is processed by hybrid clustering to obtain preliminary PMU measurement sequence point identification results, including:

利用线性回归辨识器、DBSCAN辨识器和高斯混合模型(Gaussian mixturemodels，GMM)辨识器依次对二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果。The two-dimensional image data are processed in turn by using linear regression identifier, DBSCAN identifier and Gaussian mixture models (GMM) identifier to obtain preliminary identification results of each point in the PMU measurement sequence.

作为本发明的进一步方案，对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果，包括：As a further solution of the present invention, integrated learning identification and result correction are performed on the preliminary identification results of each point in the PMU measurement sequence to obtain the final identification results of each point in the PMU measurement sequence, including:

对初步PMU量测序列各点辨识结果进行集成学习辨识，获得票决结果；Perform integrated learning and identification on the identification results of each point in the preliminary PMU measurement sequence to obtain the voting results;

对所述票决结果进行结果修正获得最终的PMU量测序列各点辨识结果。The voting results are corrected to obtain the final identification results of each point in the PMU measurement sequence.

作为本发明的进一步方案，所述集成学习辨识为利用多数投票作为基础辨识器的合集方法。As a further solution of the present invention, the ensemble learning identification is a collection method using majority voting as a basic identifier.

第二方面，在本发明提供的又一个实施例中，提供了智能配电网PMU不良数据辨识系统，该系统包括：量测重构序列获取模块、构建序列组合模块、初步辨识模块和最终辨识模块；In a second aspect, in another embodiment provided by the present invention, a smart distribution network PMU bad data identification system is provided, the system comprising: a measurement reconstruction sequence acquisition module, a construction sequence combination module, a preliminary identification module and a final identification module;

所述量测重构序列获取模块，用于获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列；The measurement reconstruction sequence acquisition module is used to acquire a PMU measurement sequence and obtain a corresponding measurement reconstruction sequence based on the PMU measurement sequence;

所述构建序列组合模块，用于基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据。The sequence combination construction module is used to construct a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generate corresponding two-dimensional image data.

所述初步辨识模块，用于利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果；The preliminary identification module is used to process the two-dimensional image data by using hybrid clustering to obtain preliminary identification results of each point in the PMU measurement sequence;

所述最终辨识模块，用于对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果。The final identification module is used to perform integrated learning identification and result correction on the preliminary identification results of each point in the PMU measurement sequence to obtain the final identification results of each point in the PMU measurement sequence.

第三方面，在本发明提供的又一个实施例中，提供了一种设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器加载并执行所述计算机程序时实现智能配电网PMU不良数据辨识方法的步骤。In a third aspect, in another embodiment provided by the present invention, a device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for identifying bad data of PMU in a smart distribution network when loading and executing the computer program.

第四方面，在本发明提供的再一个实施例中，提供了一种存储介质，存储有计算机程序，所述计算机程序被处理器加载并执行时实现所述智能配电网PMU不良数据辨识方法的步骤。In a fourth aspect, in another embodiment provided by the present invention, a storage medium is provided, storing a computer program, and when the computer program is loaded and executed by a processor, the steps of the method for identifying bad data of PMU in the smart distribution network are implemented.

本发明提供的技术方案，具有如下有益效果：The technical solution provided by the present invention has the following beneficial effects:

本发明提供的智能配电网PMU不良数据辨识方法、系统、设备及存储介质，本发明获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列；基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据；利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果；对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果；本发明将PMU量测序列及其重构序列构成坐标，绘制二维散点图，该基于二维图像的方法提高了PMU测量数据时间序列及其重构序列的时空相关性分析效率，具体区分了正常和不良数据与其重构数据的差异特征。采用混合聚类以划分正常和不良数据，并采用集成学习的方式对辨识结果进行进一步修正。基于图表的方法识别了正常和不良数据的特征，并发挥了维度聚类和分类器的潜力。故时空关联分析效率和辨识灵敏度得到了提高。The present invention provides a method, system, device and storage medium for identifying bad data of PMU in a smart distribution network. The present invention obtains a PMU measurement sequence, and obtains a corresponding measurement reconstruction sequence based on the PMU measurement sequence; based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, a PMU measurement sequence combination is formed, and corresponding two-dimensional image data is generated; the two-dimensional image data is processed by hybrid clustering to obtain preliminary identification results of each point of the PMU measurement sequence; the preliminary identification results of each point of the PMU measurement sequence are integrated learning identification and result correction to obtain the final identification results of each point of the PMU measurement sequence; the present invention forms coordinates of the PMU measurement sequence and its reconstruction sequence, and draws a two-dimensional scatter plot. The method based on two-dimensional images improves the efficiency of spatiotemporal correlation analysis of the PMU measurement data time series and its reconstruction sequence, and specifically distinguishes the difference characteristics between normal and bad data and their reconstructed data. Hybrid clustering is used to divide normal and bad data, and the identification results are further corrected by integrated learning. The chart-based method identifies the characteristics of normal and bad data and exerts the potential of dimensional clustering and classifiers. Therefore, the efficiency of spatiotemporal correlation analysis and identification sensitivity are improved.

本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。These and other aspects of the present invention will become more concise and understandable in the following description of the embodiments. It should be understood that the above general description and the following detailed description are only exemplary and explanatory and cannot limit the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的实施例。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other embodiments can be obtained based on these drawings without paying creative work.

图1为本发明一个实施例的智能配电网PMU不良数据辨识方法的流程图。FIG1 is a flow chart of a method for identifying bad data of a PMU in a smart distribution network according to an embodiment of the present invention.

图2为本发明一个实施例的智能配电网PMU不良数据辨识方法的具体流程图。FIG2 is a specific flow chart of a method for identifying bad data of a PMU in a smart distribution network according to an embodiment of the present invention.

图3为不同类型的PMU异常图。Figure 3 shows different types of PMU anomaly diagrams.

图4为GAN模型网络结构图。Figure 4 is a diagram of the GAN model network structure.

图5为PMU量测序列及其量测重构序列。FIG5 shows a PMU measurement sequence and its measurement reconstruction sequence.

图6为PMU量测序列及其量测重构序列散点图。FIG6 is a scatter plot of the PMU measurement sequence and its measurement reconstruction sequence.

图7为安全区修改图。Figure 7 is a modified diagram of the safety zone.

图8为典型条件下的正常数据(相关系数:0.9669)PMU测量曲线。Figure 8 shows the PMU measurement curve of normal data (correlation coefficient: 0.9669) under typical conditions.

图9为典型条件下的阶跃事件数据和异常值(相关系数:0.9665)PMU测量曲线。Figure 9 shows the step event data under typical conditions and the outlier (correlation coefficient: 0.9665) PMU measurement curve.

图10为典型条件下的尖峰事件数据和异常值(相关系数:0.9435)PMU测量曲线。Figure 10 shows the spike event data under typical conditions and the outlier (correlation coefficient: 0.9435) PMU measurement curve.

图11为典型条件下的异常值(相关系数:0.6527)PMU测量曲线。Figure 11 shows the PMU measurement curve with an outlier value (correlation coefficient: 0.6527) under typical conditions.

图12为典型条件下正常数据的不良数据辨识流程、结果和效果。FIG12 shows the bad data identification process, results and effects of normal data under typical conditions.

图13为典型条件下含有阶跃事件数据和异常值的不良数据辨识流程、结果和效果。FIG13 shows the bad data identification process, results and effects containing step event data and outliers under typical conditions.

图14为典型条件下含有尖峰事件数据和异常值的不良数据辨识流程、结果和效果。FIG14 shows the bad data identification process, results and effects containing spike event data and outliers under typical conditions.

图15为典型条件下含有异常值的不良数据辨识流程、结果和效果。Figure 15 shows the bad data identification process, results and effects containing outliers under typical conditions.

图16为本发明一个实施例的智能配电网PMU不良数据辨识系统的结构图。FIG. 16 is a structural diagram of a smart distribution network PMU bad data identification system according to an embodiment of the present invention.

图中：量测重构序列获取模块-100、构建序列组合模块-200、初步辨识模块-300、最终辨识模块-400。In the figure: measurement and reconstruction sequence acquisition module-100, construction sequence combination module-200, preliminary identification module-300, final identification module-400.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

附图中所示的流程图仅是示例说明，不是必须包括所有的内容和操作/步骤，也不是必须按所描述的顺序执行。例如，有的操作/步骤还可以分解、组合或部分合并，因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all the contents and operations/steps, nor must they be executed in the order described. For example, some operations/steps may also be decomposed, combined or partially merged, so the actual execution order may change according to actual conditions.

应当理解，在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样，除非上下文清楚地指明其它情况，否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terms used in this specification of the present invention are only for the purpose of describing specific embodiments and are not intended to limit the present invention. As used in the specification of the present invention and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms.

其中，PMU以Δt的采样间隔从单个总线顺序获取测量值。在用于观察的长度为T的时间窗口内，长度为n(＝T/Δt)的PMU测量时间序列被捕获作为不良PMU数据辨识的基本分析单元。采样时间序列由几个电气量组成，例如电压和电流的幅度和角度、有功和无功功率注入。不良数据辨识的进一步分析是在PMU电压幅值时间序列上进行的，表示为Here, the PMU sequentially acquires measurements from a single bus at a sampling interval of Δt. Within a time window of length T for observation, a PMU measurement time series of length n (=T/Δt) is captured as the basic analysis unit for bad PMU data identification. The sampled time series consists of several electrical quantities, such as the amplitude and angle of voltage and current, active and reactive power injection. Further analysis for bad data identification is performed on the PMU voltage amplitude time series, expressed as

x_i＝[x_i,1,x_i,2,…,x_i,n] (1)x _i =[x _i,1 ,x _i,2 ,…,x _i,n ] (1)

式中i表示PMU部署的母线编号。Where i represents the bus number where the PMU is deployed.

除了正常数据之外，PMU测量获得的异常数据还包括缺失数据、异常值和事件数据，其中异常值和缺失数据是由于测量质量差而导致的不良数据。不良数据辨识的目的是将异常值和缺失数据归类为不良数据，而事件数据通过精确测量归类为正常数据。In addition to normal data, abnormal data obtained by PMU measurement also includes missing data, outliers and event data, among which outliers and missing data are bad data caused by poor measurement quality. The purpose of bad data identification is to classify outliers and missing data as bad data, while event data is classified as normal data through accurate measurement.

缺失数据通常是由数据测量、传输和存储过程中的缺失故障引起的，这导致历史数据集中相应位置的值为零、空或NaN(not a number)。当数据下降时，丢失的数据在PMU时间序列的可见图中很容易找到，并且很容易通过精确的标签匹配来消除。因此，本节很少关注缺失数据的辨识。Missing data is usually caused by missing faults in the data measurement, transmission and storage process, which results in the value of zero, empty or NaN (not a number) at the corresponding position in the historical data set. When the data is dropped, the missing data is easy to find in the visible graph of the PMU time series and can be easily eliminated by exact label matching. Therefore, this section pays little attention to the identification of missing data.

图3示出了具有5秒窗口长度的单总线PMU测量时间序列的说明性实例，其展示了不同类型的PMU异常数据，包括缺失数据、异常值和事件数据。PMU设备每秒采样50个数据点，间隔为20毫秒。毫无疑问，4秒钟的急剧数据下降通过其标志“0”被识别为丢失数据。基于其独特的图形特征，在大约4.60秒时PMU测量值的阶跃变化可以被识别为阶跃事件数据。然而，在大约0.76秒、1.44秒、2.26秒、3.24秒的数据尖峰被相似的形状混淆，并且不能被识别为异常值或尖峰事件数据。由于系统事件未知，需要更多的信息作为识别类似尖峰状数据的标准。FIG3 shows an illustrative example of a single-bus PMU measurement time series with a 5-second window length, which demonstrates different types of PMU abnormal data, including missing data, outliers, and event data. The PMU device samples 50 data points per second with an interval of 20 milliseconds. Undoubtedly, the sharp data drop of 4 seconds is identified as missing data by its marker "0". Based on its unique graphical features, the step change of the PMU measurement value at approximately 4.60 seconds can be identified as step event data. However, the data spikes at approximately 0.76 seconds, 1.44 seconds, 2.26 seconds, and 3.24 seconds are confused by similar shapes and cannot be identified as outliers or spike event data. Since the system event is unknown, more information is needed as a criterion for identifying similar spike-like data.

具体地，下面结合附图，对本发明实施例作进一步阐述。Specifically, the embodiments of the present invention are further described below in conjunction with the accompanying drawings.

请参阅图1和图2，图1是本发明实施例提供的一种智能配电网PMU不良数据辨识方法的流程图，如图1所示，该智能配电网PMU不良数据辨识方法包括步骤S10至步骤S40。Please refer to FIG. 1 and FIG. 2 . FIG. 1 is a flow chart of a method for identifying bad data of a PMU in a smart distribution network provided by an embodiment of the present invention. As shown in FIG. 1 , the method for identifying bad data of a PMU in a smart distribution network includes steps S10 to S40 .

S10、获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列。其中所述PMU量测序列为V_i，量测重构序列为V_j。PMU量测序列为一节点电压PMU测量数据时间序列。S10, obtaining a PMU measurement sequence, and obtaining a corresponding measurement reconstruction sequence based on the PMU measurement sequence, wherein the PMU measurement sequence is V _i , and the measurement reconstruction sequence is V _j . The PMU measurement sequence is a node voltage PMU measurement data time sequence.

在本发明的实施例中，在所述获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列，之前还包括步骤：In an embodiment of the present invention, before acquiring the PMU measurement sequence and obtaining the corresponding measurement reconstruction sequence based on the PMU measurement sequence, the following steps are also included:

在本发明的实施例中，所述样本PMU量测序列组合包括PMU量测序列和样本量测重构序列。In an embodiment of the present invention, the sample PMU measurement sequence combination includes a PMU measurement sequence and a sample measurement reconstruction sequence.

在本发明的实施例中，样本PMU量测序列集为同一时刻智能电网系统所有含PMU量测的节点电压量测数据。In an embodiment of the present invention, the sample PMU measurement sequence set is all node voltage measurement data including PMU measurements in the smart grid system at the same time.

在本发明的实施例中，样本量测重构序列集为历史时刻量测数据。In an embodiment of the present invention, the sample measurement reconstruction sequence set is historical moment measurement data.

所述获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列，包括：The acquiring of the PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on the PMU measurement sequence includes:

所述GAN模型包括生成模型(G)和判别模型(D)，其网络结构如图4所示。The GAN model includes a generation model (G) and a discrimination model (D), and its network structure is shown in FIG4 .

其中，生成模型(G)试图生成具有与真实数据样本相同概率分布的样本。判别模型(D)根据二元分类器判断输入样本是否为真实样本。在训练过程中，生成模型(G)的生成能力和判别模型(D)的分辨能力都得到了提高。最后，判别模型(D)不会区分生成的数据样本和真实的数据样本，判别模型(D)和生成模型(G)之间的博弈达到动态纳什均衡。GAN的训练过程是为了解决以下最大最小二元博弈：Among them, the generative model (G) attempts to generate samples with the same probability distribution as the real data samples. The discriminative model (D) determines whether the input sample is a real sample based on the binary classifier. During the training process, the generative ability of the generative model (G) and the discriminative model (D) are improved. Finally, the discriminative model (D) does not distinguish between the generated data samples and the real data samples, and the game between the discriminative model (D) and the generative model (G) reaches a dynamic Nash equilibrium. The training process of GAN is to solve the following maximum and minimum binary game:

其中，x代表真实数据样本；z代表生成模型的随机噪声；G(z)代表考虑随机噪声z的生成样本；D(x)和D(G(z))分别代表识别真实样本和生成样本的概率；p_d(x)和p_z(z)分别为真实样本和随机噪声的概率分布；V(D,G)为价值函数，衡量生成样本和真实样本的概率分布之间的差异。D和G的参数，表示为θ_D和θ_G，根据梯度下降法进行更新，二者对应的随机梯度g_θD和g_θG分别根据(1-3)和(1-4)求得。Where x represents the real data sample; z represents the random noise of the generative model; G(z) represents the generated sample considering the random noise z; D(x) and D(G(z)) represent the probabilities of identifying the real sample and the generated sample respectively; p _d (x) and p _z (z) are the probability distributions of the real sample and the random noise respectively; V(D,G) is the value function, which measures the difference between the probability distributions of the generated sample and the real sample. The parameters of D and G, denoted as θ _D and θ _G , are updated according to the gradient descent method, and their corresponding random gradients g _θD and g _θG are obtained according to (1-3) and (1-4) respectively.

可以确定的是，单一节点电压PMU量测序列及其量测重构序列的动态电压分布，这增强了挖掘局部母线的时空相关性以辨识不良数据的潜力。如图5所示，在通过线性插值填充数据点之后，电压幅值曲线被表示为PMU量测序列V_i。量测重构序列的同时段电压幅度曲线表示为V_j，与V_i的相关系数为0.9408。基于差异分析，V_i在1.44s、2.26s、3.24s的数据尖峰很可能是异常值，因为V_i上发生了电压幅值的突然变化，而V_j呈现出平滑的趋势。V_j在3s时的数据峰值也是如此。然而，在0.76秒时V_i和V_j的数据尖峰优选被识别为事件数据，根据两个轮廓经历相似的突然变化趋势的条件。现实中对图形和数据趋势的综合分析不规范，导致人工判断失误。因此，对不良数据辨识的定性分析需要进一步研究。It can be determined that the dynamic voltage distribution of the single node voltage PMU measurement sequence and its measurement reconstruction sequence enhances the potential of mining the spatiotemporal correlation of the local bus to identify bad data. As shown in Figure 5, after filling the data points by linear interpolation, the voltage amplitude curve is represented as the PMU measurement sequence _Vi . The voltage amplitude curve of the same period of the measurement reconstruction sequence is represented as _Vj , and the correlation coefficient with _Vi is 0.9408. Based on the difference analysis, the data spikes of _Vi at 1.44s, 2.26s, and 3.24s are likely to be outliers because a sudden change in voltage amplitude occurs on _Vi , while _Vj shows a smooth trend. The same is true for the data peak of _Vj at 3s. However, the data spikes of _Vi and _Vj at 0.76 seconds are preferably identified as event data, based on the condition that the two profiles experience similar sudden change trends. In reality, the comprehensive analysis of graphics and data trends is not standardized, resulting in manual misjudgment. Therefore, the qualitative analysis of bad data identification needs further research.

S20、基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据。S20. Based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, a PMU measurement sequence combination is constructed, and corresponding two-dimensional image data is generated.

在本发明的实施例中，所述基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据，包括：In an embodiment of the present invention, the forming of a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data, includes:

取单一节点的PMU量测序列及其量测重构序列构成坐标，作为不良数据辨识程序的输入，并据此生成二维图像数据。The PMU measurement sequence of a single node and its measurement reconstruction sequence form coordinates, which are used as inputs of the bad data identification program, and two-dimensional image data is generated accordingly.

所述PMU量测序列为X_i，量测重构序列为X_j。The PMU measurement sequence is _Xi , and the measurement reconstruction sequence is _Xj .

所述PMU量测序列组合为X_ij，所述X_ij通过如下公式计算：The PMU measurement sequence combination is _Xij , and _Xij is calculated by the following formula:

利用所述PMU量测序列组合X_ij生成二维坐标，通过如下公式实现：The two-dimensional coordinates are generated by combining the PMU measurement sequence _Xij , which is implemented by the following formula:

其中，坐标中的每个坐标对应于总线i、j在同一时间段的一对PMU测量值。此外，还绘制了坐标的散点图，以显示时空分布和相关性。in, Each coordinate in the coordinates corresponds to a pair of PMU measurements of buses i and j at the same time period. In addition, Scatter plots of coordinates to show spatiotemporal distribution and correlation.

图6中示例性实例的散点图，其中由虚线框住的中间数据簇为正常数据，两侧数据簇为事件数据，虚线框外的散点为异常值。FIG6 is a scatter plot of an exemplary example, in which the middle data cluster framed by a dotted line is normal data, the data clusters on both sides are event data, and the scattered points outside the dotted line frame are outliers.

如图6所示，由于两个选定PMU量测序列的强相关性，大多数数据点密集地分布在对角线上。为了容忍正常测量的系统误差和系统运行状态的变化，用于排除不良数据的“安全区”被设计为倾斜条，在对角线附近区域中可见。正常数据和事件数据预计位于安全区，因为正常和事件周期内成对的PMU测量符合强相关性特征。由于测量数据的严重偏差，较大V_i和较小V_j的异常值预计稀疏地位于的下三角形区域，而较小V_i和较大V_j的异常值预计稀疏地位于上三角形区域。As shown in Figure 6, most of the data points are densely distributed on the diagonal due to the strong correlation of the two selected PMU measurement sequences. In order to tolerate the systematic errors of normal measurements and the changes in the system operating status, the "safe zone" for excluding bad data is designed as a tilted bar, which is visible in the area near the diagonal. Normal data and event data are expected to be located in the safe zone because the paired PMU measurements in the normal and event periods meet the strong correlation characteristics. Due to the severe deviation of the measurement data, the outliers with larger _Vi and smaller _Vj are expected to be sparsely located in the lower triangular area, while the outliers with smaller _Vi and larger _Vj are expected to be sparsely located in the upper triangular area.

PMU的高分辨率优点导致来自同一类型的测量数据的聚集分布，这在所提出的方法中被充分利用。此外，应注意，如果两个PMU测量同时遇到异常值，相应的数据点可能位于安全区。然而，由于单个PMU上不良数据的出现是一个小概率事件，所以上述情况的概率极小，在本节中被忽略。The high resolution advantage of PMU leads to the clustered distribution of measurement data from the same type, which is fully utilized in the proposed method. In addition, it should be noted that if two PMU measurements encounter outliers at the same time, the corresponding data points may be located in the safe zone. However, since the occurrence of bad data on a single PMU is a low-probability event, the probability of the above situation is extremely small and is ignored in this section.

同一类型的数据聚类和安全区域的划分在实践中不适合手工完成，因此在该方法中使用了聚类和分类器，即基础不良PMU数据辨识器。基于通过PMU测量的散点图的二维时空相关性分析的不良数据辨识的思想实现了当前使用的聚类器和分类器的潜力，因为它们的原始应用对象是二维图像。与传统分析不同，该方法在时空分析上是有效的。The clustering of data of the same type and the division of safe areas are not suitable for manual work in practice, so clustering and classifiers are used in this method, namely the basic bad PMU data identifier. The idea of bad data identification based on the two-dimensional spatiotemporal correlation analysis of the scatter plot measured by PMU realizes the potential of the currently used clusterers and classifiers, because their original application objects are two-dimensional images. Unlike traditional analysis, this method is effective in spatiotemporal analysis.

S30、利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果。S30 , using hybrid clustering to process the two-dimensional image data to obtain preliminary identification results of each point in the PMU measurement sequence.

在本发明的实施例中，所述利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果，包括：In an embodiment of the present invention, the two-dimensional image data is processed by hybrid clustering to obtain preliminary PMU measurement sequence point identification results, including:

利用线性回归辨识器、DBSCAN辨识器和GMM辨识器依次对二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果。The two-dimensional image data are processed in sequence using linear regression identifier, DBSCAN identifier and GMM identifier to obtain preliminary identification results of each point in the PMU measurement sequence.

本发明利用所提出的基于图表的方法识别了正常和不良数据的特征，并发挥了维度聚类和分类器的潜力。因此，时空关联分析效率和辨识灵敏度得到了提高。The present invention utilizes the proposed graph-based method to identify the characteristics of normal and bad data and exerts the potential of dimensional clustering and classifiers. Therefore, the efficiency of spatiotemporal correlation analysis and the sensitivity of identification are improved.

所述线性回归辨识器(LR)以最小化数据集内观察到的目标y和通过线性近似预测的目标之间的残差平方和。在二维模型中，回归线被认为是The linear regression classifier (LR) is designed to minimize the residual sum of squares between the observed target y in the dataset and the target predicted by the linear approximation. In a two-dimensional model, the regression line is considered to be

其中，y和x分别表示近似的目标向量和输入向量，a和b分别表示回归线的斜率和截距。为了解决分类问题，x_i和x_j分别作为线性模型中的目标和输入，改写为Where y and x represent the approximate target vector and input vector, respectively, and a and b represent the slope and intercept of the regression line, respectively. To solve the classification problem, _xi and _xj are used as the target and input in the linear model, respectively, and are rewritten as

为了测量近似目标和观测目标之间的误差，标准偏差σ计算如下To measure the error between the approximated target and the observed target, the standard deviation σ is calculated as follows

应用3σ原理来容忍测量误差并避免对正常数据的误判。然后，图6的安全区域可以建模为:The 3σ principle is applied to tolerate measurement errors and avoid misjudgment of normal data. Then, the safety area of Figure 6 can be modeled as:

回归线和σ由线性回归辨识器在PMU测量的每个预设时间窗口内计算。线性回归辨识器的目的是对包含强相关事件数据的对角分布正态数据进行分类。然后，超出安全区域范围约束的数据点被识别为异常值。线性回归是基于回归的辨识器，因此异常偏差的不确定性导致最优线性划分的变化。因此，用3σ原则划分的安全区很可能包含一些偏差相对较小的不良数据，并排除极少数在极端系统运行状态下偏离回归线的正常数据。误判会降低线性回归辨识器的精度。The regression line and σ are calculated by the linear regression identifier within each preset time window of the PMU measurement. The purpose of the linear regression identifier is to classify diagonally distributed normal data containing strongly correlated event data. Then, data points that exceed the range constraints of the safe area are identified as outliers. Linear regression is a regression-based identifier, so the uncertainty of abnormal deviations leads to changes in the optimal linear division. Therefore, the safety zone divided by the 3σ principle is likely to contain some bad data with relatively small deviations and exclude very few normal data that deviate from the regression line under extreme system operating conditions. Misjudgment will reduce the accuracy of the linear regression identifier.

本发明中，所述DBSCAN辨识器将集群视为由低密度区域分隔的高密度区域，因此DBSCAN找到的集群可以是任何形状。DBSCAN辨识器找到高密度的核心样本，并从中扩展集群。因此，聚类是一组核心样本和一组与核心样本接近的非核心样本。In the present invention, the DBSCAN identifier treats clusters as high-density areas separated by low-density areas, so the clusters found by DBSCAN can be of any shape. The DBSCAN identifier finds high-density core samples and expands clusters from them. Therefore, a cluster is a set of core samples and a set of non-core samples close to the core samples.

在DBSCAN算法中有两个参数来定义数据密度，min_samples和eps。从形式上来说，核心样本被定义为在每分钟距离之内的数据集中存在最小样本的样本，即邻居。集群是一组核心样本，可以通过递归获取一个核心样本，找到其所有核心样本的邻居，等等来构建。一个聚类也有一组非核心样本，它们是核心样本的邻居，但在聚类的边缘。因此，与任何核心样本的距离至少为1/4的非核心样本被算法视为异常值。虽然参数min_samples主要控制算法对噪声的容忍度，但参数eps对于数据集和距离函数的适当选择至关重要。There are two parameters to define the data density in the DBSCAN algorithm, min_samples and eps. Formally, a core sample is defined as a sample for which there is a minimum number of samples in the dataset within a distance of min_samples, i.e., neighbors. A cluster is a set of core samples that can be constructed by recursively taking a core sample, finding all its core sample's neighbors, and so on. A cluster also has a set of non-core samples that are neighbors of the core sample but on the edge of the cluster. Therefore, non-core samples that are at least 1/4 of the distance from any core sample are considered outliers by the algorithm. While the parameter min_samples mainly controls the algorithm's tolerance to noise, the parameter eps is crucial for the appropriate choice of dataset and distance function.

DBSCAN辨识器对PMU量测序列组合X_ij的二维散射数据执行，通过聚类将密集分布的数据点识别为正常数据。然后，挑选出数据量小的聚类和距离较远的数据点，将其归类为离群点。The DBSCAN identifier is executed on the two-dimensional scattering data of the PMU measurement sequence combination _Xij , and the densely distributed data points are identified as normal data through clustering. Then, clusters with small data volume and data points with long distance are selected and classified as outliers.

DBSCAN辨识器是一种基于密度的辨识器，因此系统运行状态的不确定性导致数据分布密度的变化。DBSCAN辨识器的两个参数都是手动设置的，在不良PMU数据辨识时可能会导致误判。因此，通过DBSCAN算法聚集的聚类可能会排除少量稀疏分布的正常数据，例如在系统突然变化的瞬态过程中测量的事件数据。The DBSCAN identifier is a density-based identifier, so the uncertainty of the system operating state leads to changes in the data distribution density. Both parameters of the DBSCAN identifier are set manually, which may lead to misjudgment when identifying bad PMU data. Therefore, the clusters gathered by the DBSCAN algorithm may exclude a small amount of sparsely distributed normal data, such as event data measured during transient processes where the system changes suddenly.

所述高斯混合模型算法是概率k-Means方法的一个变种。基于GMM的辨识器假设数据遵循高斯混合分布，换句话说，数据可以被认为是从k个高斯分布生成的。每个高斯分布称为一个分量，这些分量的线性叠加形成GMM的概率密度函数p(z):The Gaussian mixture model algorithm is a variant of the probabilistic k-Means method. The GMM-based identifier assumes that the data follows a Gaussian mixture distribution, in other words, the data can be considered to be generated from k Gaussian distributions. Each Gaussian distribution is called a component, and the linear superposition of these components forms the probability density function p(z) of the GMM:

其中，z是数据集D的数据样本，ω_l表示第l个分量产生x的概率，g(z|μ_l,σ_l)是第l个分量的概率密度函数，其中μ_l和σ_l是第l个高斯分布的中心和协方差矩阵。Where z is a data sample of the data set D, ω _l represents the probability that the lth component produces x, g(z|μ _l ,σ _l ) is the probability density function of the lth component, where μ _l and σ _l are the center and covariance matrix of the lth Gaussian distribution.

第k个高斯分布对应k个聚类，所以基于GMM的聚类基本上是ω_l、μ_l、σ_l的参数估计，似然函数作为GMM参数估计中的评价函数E:The kth Gaussian distribution corresponds to k clusters, so the clustering based on GMM is basically the parameter estimation of ω _l , μ _l , σ _l , and the likelihood function is used as the evaluation function E in the GMM parameter estimation:

其中n_s是数据集的样本大小，F是E的对数形式，为了计算方便。当E达到称为最大似然的最大值时，获得一对GMM参数，其确定在D中产生数据样本的概率最大的概率分布。为了最大化评价函数E，在初始化ω_l、μ_l、σ_l之后，采用类似于k-Means的基于迭代的方法如下：Where n _s is the sample size of the data set, and F is the logarithmic form of E for computational convenience. When E reaches a maximum value called maximum likelihood, a pair of GMM parameters is obtained, which determines the probability distribution with the maximum probability of generating a data sample in D. In order to maximize the evaluation function E, after initializing ω _l , μ _l , σ _l , an iteration-based method similar to k-Means is adopted as follows:

1)估计z_m属于聚类C_l的概率，即由z_m第l个高斯分布生成：1) Estimate the probability that z _m belongs to cluster C _l , which is generated by the lth Gaussian distribution of z _m :

2)根据1)中的结论，即第l个高斯分布产生p(z₁∈C_l)z₁，p(z₂∈C_l)z₂，…，p(z_n∈C_l)z_n，通过估计第l个高斯分布参数：2) According to the conclusion in 1), that is, the lth Gaussian distribution generates p(z ₁ ∈ _{C l} )z ₁ , p(z ₂ ∈ _{C l} )z ₂ , … , p(z _n ∈ _{C l} )z _n , by estimating the lth Gaussian distribution parameters:

当计算的参数在迭代中保持不变时，认为达到了收敛。Convergence is considered to be reached when the calculated parameters remain constant across iterations.

在本发明的实施例中，GMM辨识器的目的是识别具有相对宽松的误差容限的正常数据的主要簇。为了解决GMM提出的不良数据辨识问题，簇的总数被设置为2，以更好地将正常数据和不良数据分成两个簇。据信，PMU测量时间序列中的不良数据率相对较低。因此，如果两个聚类的数据量有显著的差距，则聚类中数据量高的数据点被分类为正常数据，而聚类中数据量低的数据点被分类为不良数据。然而，如果较低的聚类数据量仍然占总数据量的明显部分，则两个聚类中的数据点都被分类为正常数据，这意味着GMM辨识器不能分类出不良数据类。In an embodiment of the present invention, the purpose of the GMM identifier is to identify the main clusters of normal data with a relatively loose error tolerance. In order to solve the bad data identification problem proposed by GMM, the total number of clusters is set to 2 to better divide the normal data and bad data into two clusters. It is believed that the bad data rate in the PMU measurement time series is relatively low. Therefore, if there is a significant gap in the data volume of the two clusters, the data points with a high data volume in the cluster are classified as normal data, while the data points with a low data volume in the cluster are classified as bad data. However, if the lower cluster data volume still accounts for a significant part of the total data volume, the data points in both clusters are classified as normal data, which means that the GMM identifier cannot classify the bad data class.

在本发明的实施例中，GMM辨识器辨识方法是基于模型的，GMM辨识器的聚类总数也是一个需要预设的超参数。由于相对宽松的容忍特性，GMM辨识器可能会挖掘出其他辨识器难以辨识到的异常值，同时GMM辨识器也可能会在正常数据簇中引入一些异常值，从而导致误判。In the embodiment of the present invention, the GMM identifier identification method is based on the model, and the total number of clusters of the GMM identifier is also a hyperparameter that needs to be preset. Due to the relatively loose tolerance characteristics, the GMM identifier may dig out abnormal values that are difficult to identify by other identifiers. At the same time, the GMM identifier may also introduce some abnormal values in normal data clusters, thereby causing misjudgment.

S40、对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果。S40, performing integrated learning identification and result correction on the preliminary identification results of each point in the PMU measurement sequence to obtain the final identification results of each point in the PMU measurement sequence.

在本发明的实施例中，对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果，包括：In an embodiment of the present invention, integrated learning identification and result correction are performed on the preliminary identification results of each point in the PMU measurement sequence to obtain the final identification results of each point in the PMU measurement sequence, including:

在本发明的实施例中，所述集成学习辨识为利用多数投票作为基础辨识器的合集方法。In an embodiment of the present invention, the ensemble learning identification is a collection method using majority voting as a basic identifier.

本发明利用多数投票作为基础辨识器的合集方法，通过一致的投票来验证正常数据。对统计数据进行投票条件分析，验证可确认的正常数据。然后，根据先前验证过的正常数据所确定的正常和不良数据分布的修正划分界限，对多数投票中的有争议的辨识结果进行修正。由于图示分析，所提出的方法是无监督的；因此，它适用于在线不良PMU数据辨识。仿真结果表明，与单一辨识器相比，所提出的方法具有更优越的性能，并在较短的计算时间内获得各种条件下的准确结果。The present invention uses majority voting as an ensemble method of basic identifiers to verify normal data through consistent voting. Voting condition analysis is performed on statistical data to verify confirmable normal data. Then, the controversial identification results in majority voting are corrected based on the revised dividing boundary of normal and bad data distribution determined by previously verified normal data. Due to the graphical analysis, the proposed method is unsupervised; therefore, it is suitable for online bad PMU data identification. Simulation results show that the proposed method has superior performance compared to a single identifier and obtains accurate results under various conditions in a shorter computation time.

本发明将PMU量测序列及其量测重构序列构成坐标，绘制二维散点图，该基于二维图像的方法提高了两个PMU量测序列的时空相关性分析效率，具体区分了正常和不良数据的特征。然后采用多种基于空间模型的聚类/分类器对二维散点图进行混合聚类以划分正常和不良数据，并采用集成学习的方式对辨识结果进行进一步修正。本发明可有效解决电网中PMU测量数据的不良数据辨识问题，并提高不良数据辨识精度。The present invention forms coordinates of the PMU measurement sequence and its measurement reconstruction sequence, and draws a two-dimensional scatter plot. The method based on two-dimensional images improves the efficiency of spatiotemporal correlation analysis of two PMU measurement sequences, and specifically distinguishes the characteristics of normal and bad data. Then, a variety of clustering/classifiers based on spatial models are used to perform mixed clustering on the two-dimensional scatter plot to divide normal and bad data, and the identification results are further corrected by ensemble learning. The present invention can effectively solve the problem of bad data identification of PMU measurement data in power grids and improve the accuracy of bad data identification.

具体的，在某些情况下，每个辨识器仍然有局限性，因此单个辨识器无法识别完整类型的不良PMU数据。因此，利用各种辨识器对不良数据进行综合判断具有重要意义。混合不良数据辨识方法有望对整个PMU测量进行分类，提高不良数据辨识的准确性。该方法完全无监督，可同时应用于两个测量时间序列的在线不良PMU数据辨识。Specifically, in some cases, each identifier still has limitations, so a single identifier cannot identify the complete type of bad PMU data. Therefore, it is of great significance to use various identifiers to make a comprehensive judgment on bad data. The hybrid bad data identification method is expected to classify the entire PMU measurement and improve the accuracy of bad data identification. This method is completely unsupervised and can be applied to online bad PMU data identification of two measurement time series at the same time.

在本发明的实施例中，为了处理二进制分类问题，投票分类器的主要思想是将概念上不同的分类器组合起来，并使用多数投票(Majority vote，MV)对类别标签进行分类。这样的分类器对于一组性能良好的模型是有用的，以便平衡它们各自的弱点。在多数投票中，特定样本的分类类别标签是代表由每个单独分类器分类的大多数类别标签的类别标签。In an embodiment of the present invention, in order to handle binary classification problems, the main idea of the voting classifier is to combine conceptually different classifiers and classify the class labels using majority vote (MV). Such a classifier is useful for a group of well-performing models in order to balance their respective weaknesses. In majority voting, the classification class label of a particular sample is the class label that represents the majority of class labels classified by each individual classifier.

对于不良PMU数据辨识，数据点的类别由基本不良的PMU数据辨识器的大多数辨识结果基于多数投票来确定。如果一个数据点被所有辨识器辨识为正常/不良数据，将一致的结果识别为正常/不良数据的最终输出标签。然而，如果一个数据点的不同结果是从基本辨识器获得的，这个数据点很可能涉及某些辨识器的有限条件。需要采用DBSCAN进一步分析，因为简单地采用多数表决可能会引入误判。请注意，如果基本辨识器在采样时间窗口内对所有PMU数据点的结果保持一致，为了减轻计算负担和持续时间，将不会采取进一步的措施。For bad PMU data identification, the category of the data point is determined by the majority of identification results of the basic bad PMU data identifier based on majority voting. If a data point is identified as normal/bad data by all identifiers, the consistent result is identified as the final output label of the normal/bad data. However, if different results for a data point are obtained from the basic identifiers, this data point is likely to involve limited conditions of some identifiers. Further analysis using DBSCAN is required because simply using majority voting may introduce false positives. Please note that if the basic identifiers maintain consistent results for all PMU data points within the sampling time window, no further action will be taken to reduce the computational burden and duration.

在本发明实施例中，以多数投票为第一阶段，构建了两阶段结构的不良PMU数据辨识结果验证，其中第二阶段旨在修正多数投票中有争议的数据点的结果。基于图形分析和实验室验证数据，完全多数投票条件的统计结果(样本量为12000)列在表1中，其中每个条件的出现时间列在第4和第5列。根据统计数据，相关条件的可能结果列于第6栏。如条件6和7所示，最终结果可能与多数投票结果不同，需要进一步修改。In the embodiment of the present invention, a two-stage structure of bad PMU data identification result verification is constructed with majority voting as the first stage, wherein the second stage is intended to correct the results of controversial data points in majority voting. Based on graphical analysis and laboratory verification data, the statistical results of the complete majority voting conditions (sample size is 12,000) are listed in Table 1, where the occurrence time of each condition is listed in columns 4 and 5. Based on the statistical data, the possible results of the relevant conditions are listed in column 6. As shown in conditions 6 and 7, the final results may be different from the majority voting results and need to be further modified.

表1投票情况Table 1 Voting results

注：N：正常数据；B：不良数据；S：很少出现Note: N: normal data; B: bad data; S: rarely seen

根据表1中的统计数据，减轻多数票误判的主要问题在于对条件5、6和7的正确分类。此外，很有可能假设符合条件1、2、3、4、5和8的数据点被正确分类。然后，在获得大量正常数据的可用信息的情况下，我们打算修改由线性回归确定的安全区，并将其增强为“安全区”。According to the statistics in Table 1, the main problem of mitigating majority misclassification lies in the correct classification of conditions 5, 6, and 7. In addition, it is very likely that the data points that meet conditions 1, 2, 3, 4, 5, and 8 are correctly classified. Then, with the available information of a large amount of normal data, we intend to modify the safe zone determined by linear regression and enhance it to a "safe zone".

首先，挑选出在前一过程中由条件1、2、3和4识别的正常数据。在参数放松的情况下，进行数据库扫描以找出能够完全包含先前辨识到的正常数据的几个聚类。这些数据作为修改安全区域边缘的基础，并更详细地描述正常的数据布局。如图7所示，先前辨识到的正常数据由两个聚类聚集。First, the normal data identified by conditions 1, 2, 3 and 4 in the previous process are selected. With the parameters relaxed, a database scan is performed to find several clusters that can completely contain the previously identified normal data. These data serve as the basis for modifying the edge of the safe area and describing the normal data layout in more detail. As shown in Figure 7, the previously identified normal data is aggregated by two clusters.

其次，每个集群周围的安全区域根据数据分布进行更新。例如，通过线性回归获得的斜率为a₁的线和l₁确定聚类1周围的安全区的上限和下限。线和l₁的解析表达式由下式确定Second, the safety region around each cluster is updated according to the data distribution. For example, a line with a slope of a ₁ obtained by linear regression and _l1 determine the upper and lower limits of the safety zone around cluster 1. Line The analytical expressions for and l ₁ are determined by

其中tol是具有相对较小正值的误差容限系数，C'₁是x_i的原始指数集或聚类1中数据点的x_j。2l和2l也是如此。l₁，与l₂的边界也由每个聚类的正态数据分布确定，由斜率为-1/a₁的垂直线l'₁与l'₂测量where tol is an error tolerance coefficient with a relatively small positive value, and _C'1 is the original index set of _xi or the data point _xj in cluster 1. The same is true for 2l and 2l. l ₁ ， The boundaries with l ₂ are also determined by the normal data distribution of each cluster, measured by the vertical lines l' ₁ and l' ₂ with slope -1/a ₁

第三，垂直线l'₁，l'₂与线l₁，l₂产生四个交点，它们用作修改的安全区的上下限的转折点，如图7所示。两条直线与l₁₂连接交点并完成上下限。线与l₁₂的解析表达式由四个交点坐标确定。Third, the vertical lines l' ₁ , l' ₂ and line l ₁ ， l ₂ produces four intersection points, which are used as turning points for the upper and lower limits of the modified safety zone, as shown in Figure 7. Connect the intersection with l ₁₂ and complete the upper and lower limits. The analytical expression with l ₁₂ is determined by the coordinates of the four intersection points.

最后，为了修正多数投票中有争议数据点的不良数据辨识结果，不良数据辨识的第二阶段验证依赖于修改后的可变宽度安全区域。位于上限和下限(l₁-l₁₂-l₂)之间的数据点被分类为正常数据，而那些超出范围的点被分类为不良数据。此外，拟合情况6的数据点被线性回归和GMM辨识器识别为不良数据，这意味着这些数据点远离回归线并且不属于主要的正常数据簇。来自数据库扫描辨识器的正常数据的误判结果可能是由这些不良数据点紧密地位于散点图中的重合引起的，因此被数据库扫描识别为数据簇。基于上述特征，符合情况6的数据点应归类为不良数据，这符合多数投票的结果。Finally, in order to correct the bad data identification results of controversial data points in majority voting, the second stage verification of bad data identification relies on the modified variable width safety region. The data points between the upper and lower limits (l ₁ -l ₁₂ -l ₂ ) are classified as normal data, while those points outside the range are classified as bad data. In addition, the data points fitting case 6 are identified as bad data by the linear regression and GMM identifiers, which means that these data points are far away from the regression line and do not belong to the main cluster of normal data. The misjudgment result of normal data from the database scanning identifier may be caused by the overlap of these bad data points that are closely located in the scatter plot, and are therefore identified as data clusters by the database scanning. Based on the above characteristics, the data points that meet case 6 should be classified as bad data, which is consistent with the result of the majority vote.

在发明的案例研究中，使用了来自EPFL(瑞士)校园的开放存取PMU数据集来验证所提出的不良数据辨识方法。每个PMU测量样本的时间窗口为60秒，包括3000个数据点。测试平台是基于Intel i7-9700@3.00GHz CPU和32GB RAM的Python 3.7。In the invented case study, an open access PMU dataset from the EPFL (Switzerland) campus was used to validate the proposed bad data identification method. The time window of each PMU measurement sample is 60 seconds and includes 3000 data points. The test platform is Python 3.7 based on Intel i7-9700@3.00GHz CPU and 32GB RAM.

使用本发明方法示例性地选择了四种典型条件，包括“正常数据”、“阶梯事件数据和异常值”、“尖峰事件数据和异常值”以及“异常值”来进行案例研究。图8-11显示了典型条件下电压大小V₂和V₃的两个PMU测量曲线。以二维散点图的形式，其中x轴表示V₂，y轴表示V₃，图12-15展示了典型情况下的原始聚类和不良数据辨识结果。Four typical conditions, including "normal data", "step event data and outliers", "spike event data and outliers" and "outliers", are exemplarily selected for case study using the method of the present invention. Figures 8-11 show two PMU measurement curves of voltage magnitudes V ₂ and V ₃ under typical conditions. In the form of a two-dimensional scatter plot, where the x-axis represents V ₂ and the y-axis represents V ₃ , Figures 12-15 show the original clustering and bad data identification results under typical conditions.

每个方法的原始聚类结果显示在图的第一行，每一步的计算时间列在右下方。在图12-15{1,1}(第1行，第1列)中，散点表示PMU数据的原始样本，折线表示通过线性回归计算的上下限。。图12-15{1,5}中的折线表示修改后的安全区的上下限。The original clustering results of each method are shown in the first row of the figure, and the calculation time of each step is listed in the lower right corner. In Figure 12-15{1,1} (1st row, 1st column), the scattered points represent the original samples of PMU data, and the broken lines represent the upper and lower limits calculated by linear regression. The broken lines in Figure 12-15{1,5} represent the upper and lower limits of the modified safety zone.

图12-15第二行显示了各方法的不良数据辨识结果和效果，，圆形点代表辨识正确的结果(正常数据辨识为正常数据，或不良数据辨识为不良数据)，而三角形点(正常数据辨识为不良数据)和方形点(不良数据辨识为正常数据)分别代表错误的结果。每种方法的辨识精度，由定义，也呈现在右下方。The second row of Figures 12-15 shows the bad data identification results and effects of each method. The circular points represent the correct identification results (normal data is identified as normal data, or bad data is identified as bad data), while the triangle points (normal data is identified as bad data) and the square points (bad data is identified as normal data) represent the wrong results. The identification accuracy of each method, as defined by , is also presented in the lower right corner.

Acc＝(n_all-n_fn-n_fb)/n_all×100％ (22)Acc＝(n _all -n _fn -n _fb )/n _all ×100% (22)

其中n_all是测试实例中数据点的样本大小，n_fn是错误辨识的正常数据点的数量，n_fb是错误辨识的不良数据点的数量。指数Acc(准确度)反映了正确辨识的样本数在总样本数中的比例。Where n _all is the sample size of the data points in the test instance, n _fn is the number of wrongly identified normal data points, and n _fb is the number of wrongly identified bad data points. The index Acc (accuracy) reflects the proportion of correctly identified samples in the total number of samples.

(1)正常数据(1) Normal data

PMU量测序列及其重构序列是强相关的，相关系数为0.9669。根据图8和图12{1,1}，相关的正态数据的特征包括相似的轮廓和对角线分布的二维散点图。线性回归辨识器误判了一些偏差相对较大的正态数据，而DBSCAN辨识器则误判了一些分布稀疏的正态数据。GMM辨识器的两个簇都有很大的数据量，被合并成一个簇作为正常数据。然而，MV方法纠正了图12{2,1}和{2,2}的辨识错误，在结果修正后获得了100％的准确性。The PMU measurement sequence and its reconstructed sequence are strongly correlated, with a correlation coefficient of 0.9669. According to Figure 8 and Figure 12{1,1}, the characteristics of the correlated normal data include similar contours and diagonally distributed two-dimensional scatter plots. The linear regression identifier misjudged some normal data with relatively large deviations, while the DBSCAN identifier misjudged some sparsely distributed normal data. Both clusters of the GMM identifier had a large amount of data and were merged into one cluster as normal data. However, the MV method corrected the identification errors of Figure 12{2,1} and {2,2}, and obtained 100% accuracy after the results were corrected.

(2)阶跃事件数据和异常值(2) Step event data and outliers

根据图9和图13{1,1}，阶梯事件之后和之前的正常PMU测量数据点仍然是强相关的，并被分为两个簇群。在图13{2,2}中，DBSCAN辨识器误判了瞬时过程中的一些散乱的正常数据，而GMM的两个簇都有很大的数据量，被合并成一个簇作为正常数据，所以图13{2,3}中的异常值被误判了。在图13{2,4}中，错误的判断被MV纠正了。在结果修正后，MV的准确率保持在100％。According to Figure 9 and Figure 13{1,1}, the normal PMU measurement data points before and after the step event are still strongly correlated and divided into two clusters. In Figure 13{2,2}, the DBSCAN identifier misjudged some scattered normal data in the transient process, while the two clusters of GMM had a large amount of data and were merged into one cluster as normal data, so the outliers in Figure 13{2,3} were misjudged. In Figure 13{2,4}, the wrong judgment was corrected by MV. After the result correction, the accuracy of MV remained at 100%.

(3)尖峰事件数据和异常值(3) Spike event data and outliers

根据图10和图14{1,1}，尖峰事件内的正常PMU测量数据点是强相关的。在图14{2,2}中，DBSCAN辨识器误判了瞬时过程中的一些散乱的正常数据，而在图14{2,3}中，瞬时过程中的少量正常数据导致了GMM的误判。如图14{2,4}所示，MV纠正了一些辨识错误，这是由于大多数辨识器对一些正常数据判断错误的情况。然而，如图14{2,5}所示，MV中被误判的数据点在第二阶段修正过程中位于修正后的安全区，其辨识结果得到了修正，因此在结果修正后获得了100％的准确率。According to Figure 10 and Figure 14{1,1}, the normal PMU measurement data points within the spike event are strongly correlated. In Figure 14{2,2}, the DBSCAN identifier misjudged some scattered normal data in the transient process, while in Figure 14{2,3}, a small amount of normal data in the transient process caused the GMM to misjudge. As shown in Figure 14{2,4}, MV corrected some identification errors, which was due to the fact that most identifiers misjudged some normal data. However, as shown in Figure 14{2,5}, the misjudged data points in MV were located in the corrected safe zone during the second stage correction process, and their identification results were corrected, so 100% accuracy was obtained after the result correction.

(4)异常值(4) Outliers

来自空间相邻总线的PMU测量数据离群值的存在使相关系数降低到0.6527。根据图11和图15{1,1}，PMU测量数据点的离群值严重偏离对角线。DBSCAN和GMM辨识器能正确区分正常数据和不良数据。线性回归辨识器对图15{2,1}中一些偏差相对较大的正常数据进行了误判，如图15{2,4}所示，通过MV进行了修正。修正后的结果保持了100％的准确性。The presence of outliers in the PMU measurement data from spatially adjacent buses reduces the correlation coefficient to 0.6527. According to Figure 11 and Figure 15{1,1}, the outliers of the PMU measurement data points deviate significantly from the diagonal. The DBSCAN and GMM identifiers can correctly distinguish between normal and bad data. The linear regression identifier misjudged some relatively large deviations of normal data in Figure 15{2,1}, as shown in Figure 15{2,4}, which were corrected by MV. The corrected results maintained 100% accuracy.

基于上述分析，每个单一的辨识器在某些情况下都有局限性，这导致了错误的判断和低精确度。MV的引入增强了混合聚类的泛化能力，从而提高了不良数据辨识的准确性。在引入结果修正程序后，在不到一秒的成本计算时间内，提出的方法克服了MV的局限性，弥补了MV的错误，从而获得了最准确的结果。Based on the above analysis, each single identifier has limitations in some cases, which leads to wrong judgments and low precision. The introduction of MV enhances the generalization ability of mixed clustering, thereby improving the accuracy of bad data identification. After introducing the result correction procedure, the proposed method overcomes the limitations of MV and compensates for the errors of MV in a cost calculation time of less than one second, thus obtaining the most accurate results.

为了验证所提出的方法在在线不良数据辨识中的优越性和高性能，进行了全面的研究，其结果由指数-评价。指数Fal(误辨率)反映了错误辨识的正常数据样本数占总样本数的比例，指数Mis(漏辨率)反映了错误辨识的不良数据样本数占总样本数的比例，指数Pre(精度)反映了正确辨识的不良数据样本数占不良数据样本数的比例。利用开放的PMU数据集中的六组一小时18000个PMU数据对流(36000个PMU数据)进行性能测试，这些数据具有不同的不良数据比率和偏差范围。建议的方法是在长度为一分钟的PMU数据的移动时间窗上进行的(3000对，6000个数据)。值得注意的是，不良数据比率决定了单个PMU时间序列中不良数据的数量比例。因此，当涉及到二维分析时，实际的不良数据比率是指两个PMU时间序列中的总不良数据数除以单一时间序列长度的两倍。A comprehensive study was conducted to verify the superiority and high performance of the proposed method in online bad data identification, and the results were evaluated by the index -. The index Fal (false positive rate) reflects the ratio of the number of wrongly identified normal data samples to the total number of samples, the index Mis (missing positive rate) reflects the ratio of the number of wrongly identified bad data samples to the total number of samples, and the index Pre (precision) reflects the ratio of the number of correctly identified bad data samples to the number of bad data samples. The performance test was conducted using six sets of one-hour 18,000 PMU data convections (36,000 PMU data) from the open PMU dataset, which have different bad data ratios and deviation ranges. The proposed method was performed on a moving time window of PMU data with a length of one minute (3,000 pairs, 6,000 data). It is worth noting that the bad data ratio determines the ratio of the number of bad data in a single PMU time series. Therefore, when it comes to two-dimensional analysis, the actual bad data ratio refers to the total number of bad data in two PMU time series divided by twice the length of a single time series.

Fal＝n_fn/n_all×100％ (23)Fal＝n _fn /n _all ×100% (23)

Mis＝n_fb/n_all×100％ (24)Mis＝n _fb /n _all ×100% (24)

Pre＝n_tb/(n_tb+n_fb)×100％ (25)Pre＝n _tb /(n _tb +n _fb )×100% (25)

式中：n_tn是正确辨识到的正常数据点的数量，n_tb是正确辨识到的不良数据点的数量。Where: n _tn is the number of correctly identified normal data points, and n _tb is the number of correctly identified bad data points.

表2不良数据辨识的综合结果(EPFL数据)Table 2 Comprehensive results of bad data identification (EPFL data)

注：粗字体方法基于本节提出的二维分析，而细字体方法基于一维分析。“Proposed”一列为本项目方法所对应的测试结果。Note: The bold font method is based on the two-dimensional analysis proposed in this section, while the light font method is based on the one-dimensional analysis. The "Proposed" column is the test results corresponding to the method in this project.

表2列出了在线不良数据辨识效果的数值测试结果。由于线性回归和GMM在一维分析上的不适用性，只展示了二维和传统一维DBCSCAN方法之间的性能比较。根据表2中不良数据辨识准确率/精确度的提高和不良数据辨识遗漏率/错误率的降低，二维DBSCAN方法优于一维方法，因为散点图中存在基于密度的特征。因此，二维方法通过分析不同测量值的时空相关性，提高了不良数据的辨识性能。Table 2 lists the numerical test results of the online bad data identification effect. Due to the inapplicability of linear regression and GMM in one-dimensional analysis, only the performance comparison between the two-dimensional and traditional one-dimensional DBCSCAN methods is shown. According to the improvement of bad data identification accuracy/precision and the reduction of bad data identification omission rate/error rate in Table 2, the two-dimensional DBSCAN method is better than the one-dimensional method because of the density-based features in the scatter plot. Therefore, the two-dimensional method improves the bad data identification performance by analyzing the spatiotemporal correlation of different measurements.

此外，比较了基于单一模型的方法(LR/DBSCAN/GMM)、基于集合的方法(MV)和提议的方法之间的性能，这些方法都是基于二维分析的。尽管线性回归辨识器和GMM辨识器的性能比DBSCAN辨识器差，但它们在寻找DBSCAN辨识器的错误和缺失辨识点方面的作用被拟议的混合方法的性能所验证。基于集合的MV方法表现一般，但比大多数单基辨识器表现更好。在对六个数据集的测试中，所提出的两阶段结构的混合方法具有最高的不良数据辨识精度，最低的缺失率和低的错误率。所提方法的优异表现验证了所提不良数据辨识结果验证和修正过程通过减少错误和缺失辨识的有效性。所提方法在辨识精度上的改进意味着所提方法能够识别其他方法无法识别的某些情况，并能适应更复杂的不良数据条件。随着不良数据比率的增加和最大偏差范围的减小，由于不良辨识难度的增加，所提方法的不良数据辨识精度也随之下降。然而，由于高泛化能力和低敏感性，所提出的方法仍然具有稳定的性能，在Pre指标定义下不良数据辨识精度高于99.9％。In addition, the performances of single model-based methods (LR/DBSCAN/GMM), ensemble-based methods (MV), and the proposed method are compared, all of which are based on two-dimensional analysis. Although the performance of linear regression identifiers and GMM identifiers is worse than that of DBSCAN identifier, their role in finding the wrong and missing identification points of DBSCAN identifier is verified by the performance of the proposed hybrid method. The ensemble-based MV method performs moderately but performs better than most single-base identifiers. In the tests on six datasets, the proposed hybrid method with a two-stage structure has the highest bad data identification accuracy, the lowest missing rate, and a low error rate. The excellent performance of the proposed method verifies the effectiveness of the proposed bad data identification result verification and correction process by reducing the wrong and missing identifications. The improvement in the identification accuracy of the proposed method means that the proposed method is able to identify certain situations that other methods cannot identify and can adapt to more complex bad data conditions. With the increase of the bad data ratio and the decrease of the maximum deviation range, the bad data identification accuracy of the proposed method also decreases due to the increase of the bad data identification difficulty. However, due to the high generalization ability and low sensitivity, the proposed method still has a stable performance, and the bad data identification accuracy is higher than 99.9% under the Pre indicator definition.

所提方法处理单个时间窗内PMU时间序列的在线不良数据辨识平均计算时间为0.1612s，远小于时间窗长度，满足在线辨识的要求。The average calculation time of the proposed method for online bad data identification of PMU time series in a single time window is 0.1612s, which is much smaller than the time window length and meets the requirements of online identification.

本项目还测试了本方法用于国内配电网PMU实际量测数据的不良数据辨识效果，数据结果如表3所示。This project also tested the effectiveness of this method in identifying bad data based on actual measurement data of PMU in domestic distribution network. The data results are shown in Table 3.

表3不良数据辨识的综合结果(临港示范区数据)Table 3 Comprehensive results of bad data identification (Lingang Demonstration Zone data)

测试结果表明，本发明方法针对于临港示范区PMU实际量测数据，在Pre指标定义下不良数据辨识精度仍能高于99.9％。The test results show that the method of the present invention can still identify bad data with a precision higher than 99.9% for the actual measurement data of PMU in the Lingang Demonstration Area under the definition of the Pre indicator.

应该理解的是，上述虽然是按照某一顺序描述的，但是这些步骤并不是必然按照上述顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，本实施例的一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although described in a certain order, these steps are not necessarily performed in sequence in the above order. Unless there is clear explanation in this article, the execution of these steps does not have strict order restriction, and these steps can be performed in other orders. Moreover, a part of the steps of the present embodiment may include a plurality of steps or a plurality of stages, and these steps or stages are not necessarily performed at the same time, but can be performed at different times, and the execution order of these steps or stages is not necessarily performed in sequence, but can be performed in turn or alternately with at least a part of the steps or stages in other steps or other steps.

在一个实施例中，参见图3所示，在本发明的实施例中还提供了智能配电网PMU不良数据辨识系统，该系统包括量测重构序列获取模块100、构建序列组合模块200、初步辨识模块300和最终辨识模块400。In one embodiment, as shown in FIG. 3 , a smart distribution network PMU bad data identification system is also provided in an embodiment of the present invention, the system includes a measurement reconstruction sequence acquisition module 100 , a construction sequence combination module 200 , a preliminary identification module 300 and a final identification module 400 .

所述量测重构序列获取模块100，用于获取PMU量测序列，并基于所述PMU量测序列获得对应的量测重构序列。The measurement reconstruction sequence acquisition module 100 is used to acquire a PMU measurement sequence, and obtain a corresponding measurement reconstruction sequence based on the PMU measurement sequence.

所述构建序列组合模块200，用于基于所述PMU量测序列和对应的量测重构序列，构成PMU量测序列组合，且生成对应的二维图像数据。The constructed sequence combination module 200 is used to construct a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generate corresponding two-dimensional image data.

所述初步辨识模块300，用于利用混合聚类对所述二维图像数据进行处理，获得初步的PMU量测序列各点辨识结果。The preliminary identification module 300 is used to process the two-dimensional image data by using hybrid clustering to obtain preliminary identification results of each point in the PMU measurement sequence.

所述最终辨识模块400，用于对初步的PMU量测序列各点辨识结果进行集成学习辨识与结果修正，获得最终的PMU量测序列各点辨识结果。The final identification module 400 is used to perform integrated learning identification and result correction on the preliminary identification results of each point in the PMU measurement sequence to obtain the final identification results of each point in the PMU measurement sequence.

在一个实施例中，参见图5所示，在本发明的实施例中还提供了一种设备，包括处理器、通信接口、存储器和通信总线，其中，处理器，通信接口，存储器通过通信总线完成相互间的通信。In one embodiment, referring to FIG. 5 , a device is also provided in an embodiment of the present invention, including a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other through the communication bus.

存储器，用于存放计算机程序；Memory, used to store computer programs;

处理器，用于执行存储器上所存放的计算机程序时，执行所述的智能配电网PMU不良数据辨识方法，该处理器执行指令时实现上述方法实施例中的步骤。The processor is used to execute the computer program stored in the memory to execute the method for identifying bad data of PMU in the smart distribution network. The processor implements the steps in the above method embodiment when executing the instructions.

上述终端提到的通信总线可以是外设部件互连标准(PeripheralComponentInterconnect，简称PCI)总线或扩展工业标准结构(Extended IndustryStandardArchitecture，简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above terminal can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the above terminal and other devices.

存储器可以包括随机存取存储器(Random Access Memory，简称RAM)，也可以包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include a random access memory (RAM) or a non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(Digital Signal Processing，简称DSP)、专用集成电路(ApplicationSpecificIntegrated Circuit，简称ASIC)、现场可编程门阵列(Field－ProgrammableGate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

所述设备包括用户设备与网络设备。其中，所述用户设备包括但不限于电脑、智能手机、PDA等；所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量计算机或网络服务器构成的云，其中，云计算是分布式计算的一种，由一群松散耦合的计算机集组成的一个超级虚拟计算机。其中，所述设备可单独运行来实现本发明，也可接入网络并通过与网络中的其他设备的交互操作来实现本发明。其中，所述设备所处的网络包括但不限于互联网、广域网、城域网、局域网、VPN网络等。The device includes user equipment and network equipment. The user equipment includes but is not limited to computers, smart phones, PDAs, etc. The network equipment includes but is not limited to a single network server, a server group consisting of multiple network servers, or a cloud consisting of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a type of distributed computing, a super virtual computer consisting of a group of loosely coupled computer sets. The device can be operated alone to implement the present invention, or it can be connected to the network and implement the present invention through interactive operations with other devices in the network. The network where the device is located includes but is not limited to the Internet, wide area network, metropolitan area network, local area network, VPN network, etc.

还应当进理解，在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should be further understood that the term “and/or” used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes these combinations.

在本发明的一个实施例中还提供了一种存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述方法实施例中的步骤。In one embodiment of the present invention, a storage medium is further provided, on which a computer program is stored. When the computer program is executed by a processor, the steps in the above method embodiment are implemented.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述方法的实施例的流程。其中，本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。A person skilled in the art can understand that all or part of the processes in the above-mentioned embodiment method can be implemented by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the process of the embodiment of the above-mentioned method. Among them, any reference to memory, storage, database or other media used in the embodiments provided by the present invention can include at least one of non-volatile and volatile memory.

应当理解的是，在本文中使用的，除非上下文清楚地支持例外情况，单数形式“一个”旨在也包括复数形式。还应当理解的是，在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。上述本发明实施例公开实施例序号仅仅为了描述，不代表实施例的优劣。It should be understood that, as used herein, the singular form "a" or "an" is intended to include the plural form as well, unless the context clearly supports an exception. It should also be understood that, as used herein, "and/or" refers to any and all possible combinations of one or more of the items listed in association. The serial numbers of the embodiments disclosed in the above embodiments of the present invention are for description only and do not represent the advantages and disadvantages of the embodiments.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本发明实施例公开的范围(包括权利要求)被限于这些例子；在本发明实施例的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，并存在如上的本发明实施例的不同方面的许多其它变化，为了简明它们没有在细节中提供。因此，凡在本发明实施例的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本发明实施例的保护范围之内。A person skilled in the art should understand that the discussion of any of the above embodiments is only exemplary and is not intended to imply that the scope of the disclosure of the embodiments of the present invention (including the claims) is limited to these examples; under the concept of the embodiments of the present invention, the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the embodiments of the present invention as above, which are not provided in detail for the sake of simplicity. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.

Claims

1. A method for identifying bad data of a PMU of an intelligent power distribution network is characterized by comprising the following steps:

acquiring a PMU measurement sequence, and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;

forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence, and generating corresponding two-dimensional image data;

processing the two-dimensional image data by utilizing hybrid clustering to obtain a preliminary PMU measurement sequence point identification result;

and performing integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the final identification results of each point of the PMU measurement sequence.

2. The method for identifying PMU fault data of an intelligent power distribution network according to claim 1, wherein before obtaining the PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on the PMU measurement sequence, the method further comprises the steps of:

acquiring a sample data set, and training a GAN model by using the sample data set to obtain a trained GAN model; wherein the sample data set includes a sample PMU measurement sequence combination and corresponding sample two-dimensional image data.

3. The method for identifying PMU fault data of an intelligent power distribution network according to claim 2, wherein said obtaining a PMU measurement sequence and obtaining a corresponding measurement reconstruction sequence based on said PMU measurement sequence comprises:

And acquiring a PMU measurement sequence, and generating a corresponding measurement reconstruction sequence by utilizing a trained GAN model based on the PMU measurement sequence.

4. The method for identifying PMU fault data of intelligent power distribution network according to claim 2, wherein said PMU measurement sequence is X _i The reconstructed sequence is measured as X _j， The PMU measurement sequence is combined into X _ij The X is _ij Calculated by the following formula:

5. the method for identifying PMU failure data of an intelligent power distribution network according to claim 1, wherein said processing said two-dimensional image data by using hybrid clustering to obtain an identification result of each point of a preliminary PMU measurement sequence comprises:

and sequentially processing the two-dimensional image data by using a linear regression identifier, a DBSCAN identifier and a Gaussian mixture model identifier to obtain the identification result of each point of the initial PMU measurement sequence.

6. The method for identifying PMU fault data of intelligent power distribution network according to claim 1, wherein performing integrated learning identification and result correction on each point identification result of the preliminary PMU measurement sequence to obtain a final PMU measurement sequence each point identification result comprises:

performing integrated learning identification on identification results of each point of the initial PMU measurement sequence to obtain a vote result;

And correcting the result of the ticket to obtain the final identification result of each point of the PMU measurement sequence.

7. The method for identifying PMU failure data of a smart distribution network according to claim 6, wherein said ensemble learning identification is a ensemble method using majority vote as a base identifier.

8. Intelligent power distribution network PMU bad data identification system, its characterized in that, this system includes: the system comprises a measurement reconstruction sequence acquisition module, a construction sequence combination module, a preliminary identification module and a final identification module;

the measurement reconstruction sequence acquisition module is used for acquiring a PMU measurement sequence and acquiring a corresponding measurement reconstruction sequence based on the PMU measurement sequence;

the construction sequence combination module is used for forming a PMU measurement sequence combination based on the PMU measurement sequence and the corresponding measurement reconstruction sequence and generating corresponding two-dimensional image data;

the primary identification module is used for processing the two-dimensional image data by utilizing hybrid clustering to obtain the identification result of each point of the primary PMU measurement sequence;

and the final identification module is used for carrying out integrated learning identification and result correction on the identification results of each point of the initial PMU measurement sequence to obtain the identification results of each point of the final PMU measurement sequence.

9. An apparatus comprising a memory storing a computer program and a processor implementing the steps of the smart distribution network PMU bad data identification method according to any one of claims 1-7 when the computer program is loaded and executed by the processor.

10. A storage medium storing a computer program which, when loaded and executed by a processor, implements the steps of the smart distribution network PMU bad data identification method according to any one of claims 1-7.