CN113673551B

CN113673551B - Power metering bad data identification method and system

Info

Publication number: CN113673551B
Application number: CN202110741482.6A
Authority: CN
Inventors: 陈祉如; 代燕杰; 刘轶娟; 郭亮; 荆臻; 杜艳; 董贤光; 张志�; 赵曦
Original assignee: Shandong University; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; State Grid Shandong Electric Power Co Ltd; Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Current assignee: Shandong University; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; State Grid Shandong Electric Power Co Ltd; Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2024-05-28
Anticipated expiration: 2041-06-30
Also published as: CN113673551A

Abstract

The disclosure provides a method and a system for identifying bad electric power metering data, comprising the following steps: obtaining original electric power metering data and preprocessing; clustering the preprocessed electric power metering data; judging whether the clustering result of the data to be detected and the user to which the data belong has the similarity among classes, if so, the data are accurate data, if not, continuously judging whether the clustered data have smoothness, if so, the data are accurate data, and otherwise, the data are inaccurate data, namely bad data. By providing an accuracy quantization index, one of the quality features of the accuracy electric power metering data is more intuitively and quantitatively expressed.

Description

Power metering bad data identification method and system

Technical Field

The disclosure belongs to the technical field of electric power metering data identification, and particularly relates to an electric power metering bad data identification method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, as an important means for realizing the optimal configuration of power resources, the power market enters a rapid development stage, and power metering also becomes a very important basic link in the development process of the power market. The electric power metering data contains rich information and has important significance for the developing electric power market. By processing and analyzing the electricity metering data, more information of the electricity utilization mode of the user can be obtained, so that better fitting and alternatives can be found when the electricity metering data are missing, and more valuable references are provided for recommendation of retail packages of the user when retail markets are developed afterwards.

Currently, the electric power metering technology is gradually developed to an automatic and intelligent direction. The quality of electricity metering data has been greatly improved over the age of manual metering. However, as the demand for electric power is gradually increased in the production and life of China, the phenomenon of unstable quality of electric power metering data still exists due to metering faults of an electric power meter, interference in the data acquisition and transmission process and the like in the actual operation process. In terms of the measurement of the quality of power metering data, the integrity, timeliness and accuracy of the data are main measures. The integrity and timeliness of the method are well assessed, but the accuracy is used as the most important measurement index, and the assessment method is still immature.

Application of "Liu Li, wang Gang, denghui. K-means clustering algorithm in load curve classification [ J ]. Power system protection and control, 2011,39 (23): 65-68+73", "Liu Huizhou, zhou Kaile, hu Xiaojian. Poor load data identification and correction based on fuzzy load clustering [ J ]. Chinese power, 2013,46 (10): 29-34", and the like, the former proposed a method of identifying poor data, the former determined inaccurate data by lateral similarity or longitudinal smoothness; the latter empirically determines the range of allowable variation of load values of various load curves, and the judgment beyond the range is inaccurate data. Both provide methods for judging bad data, but the proposed inaccurate data judging method is single, and the situation of misjudgment may exist in practical application. The method combines the characteristics of the transverse similarity and the longitudinal smoothness of the load curve, is more comprehensive to consider than the characteristics of the transverse similarity and the longitudinal smoothness, and can effectively reduce the misjudgment rate. The method is suitable for searching inaccurate data generated by individual data mutation in metering data acquisition caused by electromagnetic interference and the like. The inaccuracy data is identified mainly for the already obtained electric power metering data by analyzing a daily load curve composed of active power in the electric power metering data.

In summary, the technical problems related to the acquisition of poor power metering data in the prior art are: in the prior art, the bad data identification method mostly needs to combine some data information except active power, such as combining the system line structure for acquiring metering data to perform judgment. Different from the previous methods, the method mainly focuses on the electricity utilization rule and habit of the user, the load curve acquired by the electric power metering equipment is irrelevant to equipment and modes for acquiring data and the line structure of the system, the method is quicker and more convenient, and inaccurate data identification can be carried out on all conditions capable of acquiring the load curve.

Disclosure of Invention

In order to overcome the defects in the prior art, the disclosure provides a method for identifying poor power metering data, and the searched inaccurate data points can well reduce the misjudgment rate.

To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

in a first aspect, a method for identifying poor power metering data is disclosed, including:

Obtaining original electric power metering data and preprocessing;

clustering the preprocessed electric power metering data;

judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, judging the data to be accurate data, if not, judging the data to be suspicious data, continuously judging whether the data to be tested has smoothness, if so, judging the data to be accurate data, and otherwise, judging the data to be inaccurate data to be bad data.

According to a further technical scheme, when the preprocessed electric power metering data are clustered, a clustering effectiveness index C _c is combined with a k-means clustering algorithm, and specifically:

Determining an initial cluster number k value;

K samples are selected from n samples to serve as initial clustering centers;

calculating the distance between each sample and the clustering center;

repartitioning the samples according to the principle that the minimum distance, i.e. the sum of squares of errors, is minimum;

Calculating the mean value of each type of sample as a new clustering center;

if the sum of the distance changes of the centers of the iterative clustering is smaller than a threshold value, ending the iteration;

calculating a cluster effectiveness index C _c;

And selecting different k values to perform the steps, calculating a cluster effectiveness index, and selecting the cluster number k with the maximum effectiveness index value from the cluster number k, wherein the cluster number and the cluster result are optimal.

Further technical solutions, when judging whether the data to be tested has the similarity between classes, defining a similarity index delta (i) between classes:

Delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP _c (i) is the ith data on the load curve to be tested, LP _d (i) is the ith data on the typical load curve to which the load belongs, a threshold value r is set, the data is considered to be accurate data when delta (i) is E < -r > r, r >, otherwise, when delta (i) is E < -r >, the data is accurate data When this data is identified as suspicious data.

According to the technical scheme, the feature of smoothness is used for further screening suspicious data, and smoothness indexes can be measured by comparing the data of the front point and the rear point of the suspicious data.

Further, assuming that the ith point on the load curve LP _c is considered as suspicious data, defining a smoothness metric e (i):

Epsilon (i) represents the smoothness of the data at the i-th point on the load curve to be measured. Similarly to the method for measuring the similarity index, the threshold value u is set, and it is considered that epsilon (i) epsilon [ -u, u ] is regarded as accurate data when epsilon (i) epsilon [ -u, u ], otherwise, when epsilon (i) epsilon [ -u, u ] is measured as accurate data The suspicious data is identified as inaccurate data.

Further, the thresholds r and u may be determined based on operational experience.

Further technical scheme still includes: the evaluation of the accuracy of the metering data comprises the following steps:

the accuracy of the metrology data may be measured by comparing the inaccurate data to the number of all sampled data.

In a second aspect, a power metering defect data identification system is disclosed, comprising:

A power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing;

a power metering data clustering module configured to: clustering the preprocessed electric power metering data;

a bad data determination module configured to: judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, the data are accurate data, if not, continuously judging whether the data to be tested have smoothness, if so, the data are accurate data, and if not, the data are inaccurate data, namely bad data.

The one or more of the above technical solutions have the following beneficial effects:

The invention introduces the correlation coefficient into the cluster effectiveness evaluation, characterizes the distance between samples by the correlation coefficient, and thereby defines a cluster effectiveness index C _c. Compared with the prior Xie-Beni index which uses Euclidean distance to calculate the inter-cluster distance or inter-cluster distance, the effectiveness index C _c measures the effectiveness of the clustering algorithm from the other aspect.

The index for judging the effectiveness of the clusters provided by the invention is presented as different calculation results when the number of the clusters is different from that of the clusters. When the value of C _c is maximum, the clustering effect is the best. Therefore, the invention combines the traditional k-means clustering algorithm with the calculation of the effectiveness index C _c, and can determine the optimal clustering number through iteration.

The invention provides a method for identifying poor power metering data based on similarity and self-smoothness. By combining the in-class similarity discrimination of the load curve to be measured and the typical curve with the self-smoothness discrimination of the load curve to be measured, the erroneous discrimination caused by single discrimination standard can be reduced to a certain extent. By providing an accuracy quantization index, one of the quality features of the accuracy electric power metering data is more intuitively and quantitatively expressed.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flowchart of a k-means clustering algorithm modified in accordance with an embodiment of the present disclosure;

fig. 2 is a flowchart of a bad data identification method based on intra-class similarity and self-smoothness according to an embodiment of the disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

Example 1

The embodiment discloses a method for identifying poor power metering data, which comprises the following steps of introducing a clustering effect evaluation index based on a correlation coefficient:

The cluster effectiveness has various evaluation indexes, and basically, the effectiveness judgment of quantification is carried out by judging the intra-class distance and the inter-class distance of the clustering result. A good clustering result should achieve as large a distance as possible between classes, while the distance of each sample to its cluster center is as small as possible. According to the invention, the correlation coefficient is introduced into the cluster effectiveness evaluation, and the distance between samples is represented by the correlation coefficient, so that a new cluster effectiveness evaluation index C _c.C_c is formed by the intra-class correlation coefficient and the inter-class correlation coefficient, and the similarity between the similarity and the inter-class similarity can be reflected simultaneously. The index definition process is as follows:

First, the intra-class correlation coefficient is defined as:

Where a _ci represents the class-dependent correlation coefficient of the ith load curve of class c, x _cb represents the b-th data point on the typical load curve of the c-th cluster result, Represents the mean of points on a typical load curve of class c, x _ib represents the b-th data point on the i-th curve in class c,/>And (3) representing the average value of the ith curve in the c-th class, wherein m is the number of data points on each load curve.

The inter-class correlation coefficient is defined as:

Where β _cj represents the inter-class correlation coefficient of the typical load curve in the c-th cluster with the typical load curve of the j-th class, x _cb represents the b-th data point on the typical load curve in the c-th cluster, Data mean of class c typical load curve, x _jb represents the b data point on class j typical load curve,/>Representing the mean of the data on the typical load curve of class j.

Defining a cluster effectiveness index C _c:

Where n is the total number of samples and kc represents the number of samples contained in the c-th cluster. Since the correlation coefficient is a number smaller than 1, a closer to 1 indicates a stronger correlation of both, Is the sum of the correlation coefficients in the class, max (beta _cj) is the maximum value of the correlation coefficients in the class, and the cluster effectiveness index C _c is obtained by dividing the upper part and the lower part. When the number k of clusters takes different values, the indexes C _c are different in size, and when the value C _c is maximum, the clustering effect is represented to be better, so that the optimal number of clusters can be obtained.

Improved k-means clustering algorithm

Data normalization

In order to make the load curves with larger numerical differences comparable during clustering, the data needs to be normalized first. Because the measurement data points of each hour are directly selected as the characteristics during clustering, the condition that the dimension is not uniform does not exist, and therefore, an extremum linear normalization formula is selected, and the formula is as follows:

wherein LP (i) represents the raw data for the ith point on the daily load curve, Normalized data representing the ith point on the daily load curve.

The cluster effectiveness index C _c defined above is combined with a k-means clustering algorithm to build an improved k-means clustering algorithm that makes k-value determination more basis. The basic implementation steps are as follows, referring to fig. 1:

(1) Empirically determining a k value;

(2) K samples from n samples are selected as initial cluster centers: c0, C1, C k-1;

(3) Calculating the distance between each sample and the clustering center; the sample refers to a daily load curve of the power consumer, which is a curve reflecting the trend of the power consumption of the consumer over time, consisting of one point per hour (24 points per day) or one point per 15 minutes (96 points per day). Obtainable by means of an electric power metering device.

(4) Repartitioning the samples according to the principle of minimum distance (i.e. sum of squares error);

(5) The mean value of each type of sample is calculated as a new cluster center,

(6) Judging that the sum of the distance changes of the centers of the iterative clustering is smaller than a threshold value, ending the iteration, otherwise, repeating the steps (3), (4) and (5) until the condition is met;

(7) Calculating a cluster effectiveness index C _c;

(8) And (2) returning to the step (1), selecting different k values to perform the steps, calculating a clustering effectiveness index C _c, selecting the maximum clustering number k of the index C _c from the clustering effectiveness indexes, and considering the clustering number and the clustering result to be optimal at the moment.

In a specific embodiment, referring to fig. 2, the method for identifying poor power metering data based on similarity and self-smoothness specifically includes:

(1) Bad data discrimination

In the production and life of the same user, electricity consumption is in accordance with a certain rule on the same type of working day or holiday, namely, the shapes of load curves of different days are similar. The load curve change in one day is regular, and even though the useful electric load suddenly starts, the change is limited compared with the adjacent moment, namely, the load curve itself has certain smoothness. Based on this, when inaccurate data is discriminated, the feature of similarity is first used. And searching inaccurate data by observing the transverse similarity of the data in a ratio mode. Let LP _d be a typical daily load curve of some sort, and LP _c be a daily load curve to be detected. Defining an inter-class similarity index delta (i):

Delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP _c (i) represents the ith data on the load curve to be tested, and LP _d (i) represents the ith data on the class typical load curve to which the load belongs. Setting a threshold r, considering that the data belongs to accurate data when delta (i) epsilon [ -r, r ], otherwise, when When this data is identified as suspicious data.

The feature of smoothness can then be used to further screen the suspicious data. The smoothness index may be measured by comparing the indexes of the two points before and after the suspicious data, and assuming that the ith point on the load curve LP _c is considered as the suspicious data, a smoothness metric index epsilon (i) is defined:

Regarding the above mentioned thresholds r and u, in practical applications, it may be determined empirically by the grid operator.

(2) Evaluation of accuracy of metrology data

The metering data consists of data obtained by measuring each sampling point, wherein the numerical value of part of sampling points possibly deviates from a true value due to the fault of the metering device, the interference of signal acquisition or transmission and the like, and the problem of inaccurate data occurs. Thus, the accuracy of the metrology data may be measured by comparing inaccurate data to the number of all sampled data. Therefore, the invention defines the measurement accuracy index mu to measure the accuracy of daily acquisition measurement data, and the index definition is as follows:

Where N _b represents the number of inaccurate data in the daily load measurement data, and N represents the number of all sampling points in the daily load.

(3) Novel clustering effectiveness evaluation index

The clustering algorithm selects Euclidean distance or Marsdian distance when calculating the distance between the sample and the clustering center, and then evaluates the effectiveness of the clustering by using the same distance calculation method. Thereby bringing about a problem of applicability of the clustering algorithm. If the selected distance calculation method is not applicable to the clustered objects, the obtained clustering validity evaluation result is also not credible. The invention provides a clustering effect evaluation index based on a correlation coefficient, which evaluates the clustering result from the statistical perspective, can reduce the one-sided influence of the clustering algorithm evaluation and enables the evaluation result to be more reasonable.

The traditional k-means clustering algorithm needs to manually specify the clustering number k when clustering, and the value of k directly relates to the clustering effect. How to determine the optimal number of clusters k has been an important content of the research of clustering algorithms. The index for judging the effectiveness of the clusters provided by the invention is presented as different calculation results when the number of the clusters is different from that of the clusters. When the value of C _c is maximum, the clustering effect is the best. Therefore, the invention provides an improved k-means clustering algorithm, the traditional k-means clustering algorithm is combined with the calculation of the effectiveness index C _c, and the optimal clustering number can be determined through iteration, so that the determination of the clustering number is more scientific and objective.

The invention searches inaccurate data points in the daily load curve by measuring the similarity in the class and the smoothness of the daily load curve. Compared with the situation that inaccurate data points in the load curve can be misjudged to be inaccurate data only through similarity or self smoothness, the method and the device combine the two, and can better reduce misjudgment rate.

Example two

It is an object of the present embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the method described above when executing the program.

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

Example IV

An object of the present embodiment is to provide a power metering failure data identifying system, including:

A power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing; the main hardware equipment of the part comprises:

Various electric energy meter meters installed on the user side: the method directly realizes the record of dividing the power consumption and the load of a user every hour or a specified period;

And (3) an electric power metering terminal: the meter is used for collecting and uploading metering data recorded by the meter and receiving a control command of an upper management end;

Transmission network: the network for realizing the metering acquisition data transmission comprises an optical private network, a wireless private network and the like;

and (3) a data server: storage and analysis of historical metering data;

The main hardware equipment of the part comprises:

the application server: for cluster module program storage and execution.

A bad data determination module configured to: judging whether the data to be tested has the similarity among classes, if so, judging whether the data to be tested has smoothness continuously if not, if so, judging that the data to be tested is accurate, otherwise, judging that the data to be tested is inaccurate, namely bad data.

The main hardware equipment of the part comprises:

the application server: program storage and execution are performed for the bad data determination module.

The steps involved in the devices of the second, third and fourth embodiments correspond to those of the first embodiment of the method, and the detailed description of the embodiments can be found in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present disclosure.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. The utility model provides a power metering bad data identification method which is characterized in that the method comprises the following steps:

Obtaining original electric power metering data and preprocessing;

Clustering the preprocessed electric power metering data; the clustering effectiveness index C _c is combined with a k-means clustering algorithm when the preprocessed electric power metering data are clustered; the cluster effectiveness index C _c is composed of a intra-class correlation coefficient and an inter-class correlation coefficient, and can reflect the similarity between the similarity and the inter-class similarity at the same time, and is specifically defined as:

Wherein n is the total number of samples, kc represents the number of samples contained in the c-th cluster, α _ci represents the inter-class correlation coefficient between the c-th load curve and the typical load curve of the clustering result in the class, and β _cj represents the inter-class correlation coefficient between the typical load curve in the c-th cluster and the typical load curve of the j-th cluster;

When the number k of clusters takes different values, the indexes C _c are different, and when the value C _c is maximum, the clustering effect is better, so that the optimal number of clusters can be obtained;

Judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, judging the data to be accurate data, if not, judging the data to be suspicious data, continuously judging whether the data to be tested has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data;

Specifically, when judging whether the data to be tested has the similarity between classes, defining a similarity index delta (i) between the classes:

delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP _c (i) is the ith data on the load curve to be tested, LP _d (i) is the ith data on the typical load curve to which the load belongs, a threshold value r is set, the data is considered to be accurate data when delta (i) is E < -r > r, r >, otherwise, when delta (i) is E < -r >, the data is accurate data When the data is identified as suspicious data;

Further screening suspicious data by using the characteristic of smoothness, wherein smoothness indexes are measured by comparing data of the front point and the rear point of the suspicious data;

Assuming that the ith point on the load curve LP _c is deemed to be suspicious data, a smoothness metric ε (i) is defined:

Epsilon (i) represents the smoothness of the ith point data on the load curve to be tested; setting a threshold u, considering that epsilon (i) epsilon [ -u, u ] is considered as accurate data when the data is epsilon [ -u, u ], otherwise, when The suspicious data is identified as inaccurate data;

Further comprises: a step of evaluating accuracy of the measurement data, which is to measure accuracy of the measurement data by comparing the number of inaccurate data and all the sampling data;

specifically, a measurement accuracy index μ is defined to measure the accuracy of daily collected measurement data, and the index is defined as follows:

2. The method for identifying poor power metering data according to claim 1, wherein the clustering effectiveness index C _c is combined with a k-means clustering algorithm when the preprocessed power metering data is clustered, specifically:

Determining an initial cluster number k value;

K samples are selected from n samples to serve as initial clustering centers;

calculating the distance between each sample and the clustering center;

Repartitioning the samples according to the principle of minimum distance, i.e. error square sum;

Calculating the mean value of each type of sample as a new clustering center;

calculating a cluster effectiveness index C _c;

3. The method for identifying poor power metering data according to claim 1, wherein the threshold r is determined empirically;

The threshold u is determined empirically.

4. An electric power metering defect data identification system, characterized by comprising:

A power metering data clustering module configured to: clustering the preprocessed electric power metering data; the clustering effectiveness index C _c is combined with a k-means clustering algorithm when the preprocessed electric power metering data are clustered; the cluster effectiveness index C _c is composed of a intra-class correlation coefficient and an inter-class correlation coefficient, and can reflect the similarity between the similarity and the inter-class similarity at the same time, and is specifically defined as:

A bad data determination module configured to: judging whether the data to be tested has the similarity among classes, if so, judging whether the data to be tested has smoothness continuously if not, if so, judging that the data to be tested is accurate, otherwise, judging that the data to be tested is inaccurate, namely bad data;

when judging whether the data to be tested has the similarity among the classes, defining a similarity index delta (i) among the classes:

5. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of the preceding claims 1-3 when the program is executed.

6. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims 1-3.