CN113673551B - Power metering bad data identification method and system - Google Patents

Power metering bad data identification method and system Download PDF

Info

Publication number
CN113673551B
CN113673551B CN202110741482.6A CN202110741482A CN113673551B CN 113673551 B CN113673551 B CN 113673551B CN 202110741482 A CN202110741482 A CN 202110741482A CN 113673551 B CN113673551 B CN 113673551B
Authority
CN
China
Prior art keywords
data
clustering
load curve
similarity
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110741482.6A
Other languages
Chinese (zh)
Other versions
CN113673551A (en
Inventor
陈祉如
代燕杰
刘轶娟
郭亮
荆臻
杜艳
董贤光
张志�
赵曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Original Assignee
Shandong University
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, State Grid Shandong Electric Power Co Ltd, Marketing Service Center of State Grid Shandong Electric Power Co Ltd filed Critical Shandong University
Priority to CN202110741482.6A priority Critical patent/CN113673551B/en
Publication of CN113673551A publication Critical patent/CN113673551A/en
Application granted granted Critical
Publication of CN113673551B publication Critical patent/CN113673551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a method and a system for identifying bad electric power metering data, comprising the following steps: obtaining original electric power metering data and preprocessing; clustering the preprocessed electric power metering data; judging whether the clustering result of the data to be detected and the user to which the data belong has the similarity among classes, if so, the data are accurate data, if not, continuously judging whether the clustered data have smoothness, if so, the data are accurate data, and otherwise, the data are inaccurate data, namely bad data. By providing an accuracy quantization index, one of the quality features of the accuracy electric power metering data is more intuitively and quantitatively expressed.

Description

Power metering bad data identification method and system
Technical Field
The disclosure belongs to the technical field of electric power metering data identification, and particularly relates to an electric power metering bad data identification method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, as an important means for realizing the optimal configuration of power resources, the power market enters a rapid development stage, and power metering also becomes a very important basic link in the development process of the power market. The electric power metering data contains rich information and has important significance for the developing electric power market. By processing and analyzing the electricity metering data, more information of the electricity utilization mode of the user can be obtained, so that better fitting and alternatives can be found when the electricity metering data are missing, and more valuable references are provided for recommendation of retail packages of the user when retail markets are developed afterwards.
Currently, the electric power metering technology is gradually developed to an automatic and intelligent direction. The quality of electricity metering data has been greatly improved over the age of manual metering. However, as the demand for electric power is gradually increased in the production and life of China, the phenomenon of unstable quality of electric power metering data still exists due to metering faults of an electric power meter, interference in the data acquisition and transmission process and the like in the actual operation process. In terms of the measurement of the quality of power metering data, the integrity, timeliness and accuracy of the data are main measures. The integrity and timeliness of the method are well assessed, but the accuracy is used as the most important measurement index, and the assessment method is still immature.
Application of "Liu Li, wang Gang, denghui. K-means clustering algorithm in load curve classification [ J ]. Power system protection and control, 2011,39 (23): 65-68+73", "Liu Huizhou, zhou Kaile, hu Xiaojian. Poor load data identification and correction based on fuzzy load clustering [ J ]. Chinese power, 2013,46 (10): 29-34", and the like, the former proposed a method of identifying poor data, the former determined inaccurate data by lateral similarity or longitudinal smoothness; the latter empirically determines the range of allowable variation of load values of various load curves, and the judgment beyond the range is inaccurate data. Both provide methods for judging bad data, but the proposed inaccurate data judging method is single, and the situation of misjudgment may exist in practical application. The method combines the characteristics of the transverse similarity and the longitudinal smoothness of the load curve, is more comprehensive to consider than the characteristics of the transverse similarity and the longitudinal smoothness, and can effectively reduce the misjudgment rate. The method is suitable for searching inaccurate data generated by individual data mutation in metering data acquisition caused by electromagnetic interference and the like. The inaccuracy data is identified mainly for the already obtained electric power metering data by analyzing a daily load curve composed of active power in the electric power metering data.
In summary, the technical problems related to the acquisition of poor power metering data in the prior art are: in the prior art, the bad data identification method mostly needs to combine some data information except active power, such as combining the system line structure for acquiring metering data to perform judgment. Different from the previous methods, the method mainly focuses on the electricity utilization rule and habit of the user, the load curve acquired by the electric power metering equipment is irrelevant to equipment and modes for acquiring data and the line structure of the system, the method is quicker and more convenient, and inaccurate data identification can be carried out on all conditions capable of acquiring the load curve.
Disclosure of Invention
In order to overcome the defects in the prior art, the disclosure provides a method for identifying poor power metering data, and the searched inaccurate data points can well reduce the misjudgment rate.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a method for identifying poor power metering data is disclosed, including:
Obtaining original electric power metering data and preprocessing;
clustering the preprocessed electric power metering data;
judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, judging the data to be accurate data, if not, judging the data to be suspicious data, continuously judging whether the data to be tested has smoothness, if so, judging the data to be accurate data, and otherwise, judging the data to be inaccurate data to be bad data.
According to a further technical scheme, when the preprocessed electric power metering data are clustered, a clustering effectiveness index C c is combined with a k-means clustering algorithm, and specifically:
Determining an initial cluster number k value;
K samples are selected from n samples to serve as initial clustering centers;
calculating the distance between each sample and the clustering center;
repartitioning the samples according to the principle that the minimum distance, i.e. the sum of squares of errors, is minimum;
Calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the iterative clustering is smaller than a threshold value, ending the iteration;
calculating a cluster effectiveness index C c;
And selecting different k values to perform the steps, calculating a cluster effectiveness index, and selecting the cluster number k with the maximum effectiveness index value from the cluster number k, wherein the cluster number and the cluster result are optimal.
Further technical solutions, when judging whether the data to be tested has the similarity between classes, defining a similarity index delta (i) between classes:
Delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP c (i) is the ith data on the load curve to be tested, LP d (i) is the ith data on the typical load curve to which the load belongs, a threshold value r is set, the data is considered to be accurate data when delta (i) is E < -r > r, r >, otherwise, when delta (i) is E < -r >, the data is accurate data When this data is identified as suspicious data.
According to the technical scheme, the feature of smoothness is used for further screening suspicious data, and smoothness indexes can be measured by comparing the data of the front point and the rear point of the suspicious data.
Further, assuming that the ith point on the load curve LP c is considered as suspicious data, defining a smoothness metric e (i):
Epsilon (i) represents the smoothness of the data at the i-th point on the load curve to be measured. Similarly to the method for measuring the similarity index, the threshold value u is set, and it is considered that epsilon (i) epsilon [ -u, u ] is regarded as accurate data when epsilon (i) epsilon [ -u, u ], otherwise, when epsilon (i) epsilon [ -u, u ] is measured as accurate data The suspicious data is identified as inaccurate data.
Further, the thresholds r and u may be determined based on operational experience.
Further technical scheme still includes: the evaluation of the accuracy of the metering data comprises the following steps:
the accuracy of the metrology data may be measured by comparing the inaccurate data to the number of all sampled data.
In a second aspect, a power metering defect data identification system is disclosed, comprising:
A power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
a bad data determination module configured to: judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, the data are accurate data, if not, continuously judging whether the data to be tested have smoothness, if so, the data are accurate data, and if not, the data are inaccurate data, namely bad data.
The one or more of the above technical solutions have the following beneficial effects:
The invention introduces the correlation coefficient into the cluster effectiveness evaluation, characterizes the distance between samples by the correlation coefficient, and thereby defines a cluster effectiveness index C c. Compared with the prior Xie-Beni index which uses Euclidean distance to calculate the inter-cluster distance or inter-cluster distance, the effectiveness index C c measures the effectiveness of the clustering algorithm from the other aspect.
The index for judging the effectiveness of the clusters provided by the invention is presented as different calculation results when the number of the clusters is different from that of the clusters. When the value of C c is maximum, the clustering effect is the best. Therefore, the invention combines the traditional k-means clustering algorithm with the calculation of the effectiveness index C c, and can determine the optimal clustering number through iteration.
The invention provides a method for identifying poor power metering data based on similarity and self-smoothness. By combining the in-class similarity discrimination of the load curve to be measured and the typical curve with the self-smoothness discrimination of the load curve to be measured, the erroneous discrimination caused by single discrimination standard can be reduced to a certain extent. By providing an accuracy quantization index, one of the quality features of the accuracy electric power metering data is more intuitively and quantitatively expressed.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.
FIG. 1 is a flowchart of a k-means clustering algorithm modified in accordance with an embodiment of the present disclosure;
fig. 2 is a flowchart of a bad data identification method based on intra-class similarity and self-smoothness according to an embodiment of the disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
Example 1
The embodiment discloses a method for identifying poor power metering data, which comprises the following steps of introducing a clustering effect evaluation index based on a correlation coefficient:
The cluster effectiveness has various evaluation indexes, and basically, the effectiveness judgment of quantification is carried out by judging the intra-class distance and the inter-class distance of the clustering result. A good clustering result should achieve as large a distance as possible between classes, while the distance of each sample to its cluster center is as small as possible. According to the invention, the correlation coefficient is introduced into the cluster effectiveness evaluation, and the distance between samples is represented by the correlation coefficient, so that a new cluster effectiveness evaluation index C c.Cc is formed by the intra-class correlation coefficient and the inter-class correlation coefficient, and the similarity between the similarity and the inter-class similarity can be reflected simultaneously. The index definition process is as follows:
First, the intra-class correlation coefficient is defined as:
Where a ci represents the class-dependent correlation coefficient of the ith load curve of class c, x cb represents the b-th data point on the typical load curve of the c-th cluster result, Represents the mean of points on a typical load curve of class c, x ib represents the b-th data point on the i-th curve in class c,/>And (3) representing the average value of the ith curve in the c-th class, wherein m is the number of data points on each load curve.
The inter-class correlation coefficient is defined as:
Where β cj represents the inter-class correlation coefficient of the typical load curve in the c-th cluster with the typical load curve of the j-th class, x cb represents the b-th data point on the typical load curve in the c-th cluster, Data mean of class c typical load curve, x jb represents the b data point on class j typical load curve,/>Representing the mean of the data on the typical load curve of class j.
Defining a cluster effectiveness index C c:
Where n is the total number of samples and kc represents the number of samples contained in the c-th cluster. Since the correlation coefficient is a number smaller than 1, a closer to 1 indicates a stronger correlation of both, Is the sum of the correlation coefficients in the class, max (beta cj) is the maximum value of the correlation coefficients in the class, and the cluster effectiveness index C c is obtained by dividing the upper part and the lower part. When the number k of clusters takes different values, the indexes C c are different in size, and when the value C c is maximum, the clustering effect is represented to be better, so that the optimal number of clusters can be obtained.
Improved k-means clustering algorithm
Data normalization
In order to make the load curves with larger numerical differences comparable during clustering, the data needs to be normalized first. Because the measurement data points of each hour are directly selected as the characteristics during clustering, the condition that the dimension is not uniform does not exist, and therefore, an extremum linear normalization formula is selected, and the formula is as follows:
wherein LP (i) represents the raw data for the ith point on the daily load curve, Normalized data representing the ith point on the daily load curve.
The cluster effectiveness index C c defined above is combined with a k-means clustering algorithm to build an improved k-means clustering algorithm that makes k-value determination more basis. The basic implementation steps are as follows, referring to fig. 1:
(1) Empirically determining a k value;
(2) K samples from n samples are selected as initial cluster centers: c0, C1, C k-1;
(3) Calculating the distance between each sample and the clustering center; the sample refers to a daily load curve of the power consumer, which is a curve reflecting the trend of the power consumption of the consumer over time, consisting of one point per hour (24 points per day) or one point per 15 minutes (96 points per day). Obtainable by means of an electric power metering device.
(4) Repartitioning the samples according to the principle of minimum distance (i.e. sum of squares error);
(5) The mean value of each type of sample is calculated as a new cluster center,
(6) Judging that the sum of the distance changes of the centers of the iterative clustering is smaller than a threshold value, ending the iteration, otherwise, repeating the steps (3), (4) and (5) until the condition is met;
(7) Calculating a cluster effectiveness index C c;
(8) And (2) returning to the step (1), selecting different k values to perform the steps, calculating a clustering effectiveness index C c, selecting the maximum clustering number k of the index C c from the clustering effectiveness indexes, and considering the clustering number and the clustering result to be optimal at the moment.
In a specific embodiment, referring to fig. 2, the method for identifying poor power metering data based on similarity and self-smoothness specifically includes:
(1) Bad data discrimination
In the production and life of the same user, electricity consumption is in accordance with a certain rule on the same type of working day or holiday, namely, the shapes of load curves of different days are similar. The load curve change in one day is regular, and even though the useful electric load suddenly starts, the change is limited compared with the adjacent moment, namely, the load curve itself has certain smoothness. Based on this, when inaccurate data is discriminated, the feature of similarity is first used. And searching inaccurate data by observing the transverse similarity of the data in a ratio mode. Let LP d be a typical daily load curve of some sort, and LP c be a daily load curve to be detected. Defining an inter-class similarity index delta (i):
Delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP c (i) represents the ith data on the load curve to be tested, and LP d (i) represents the ith data on the class typical load curve to which the load belongs. Setting a threshold r, considering that the data belongs to accurate data when delta (i) epsilon [ -r, r ], otherwise, when When this data is identified as suspicious data.
The feature of smoothness can then be used to further screen the suspicious data. The smoothness index may be measured by comparing the indexes of the two points before and after the suspicious data, and assuming that the ith point on the load curve LP c is considered as the suspicious data, a smoothness metric index epsilon (i) is defined:
Epsilon (i) represents the smoothness of the data at the i-th point on the load curve to be measured. Similarly to the method for measuring the similarity index, the threshold value u is set, and it is considered that epsilon (i) epsilon [ -u, u ] is regarded as accurate data when epsilon (i) epsilon [ -u, u ], otherwise, when epsilon (i) epsilon [ -u, u ] is measured as accurate data The suspicious data is identified as inaccurate data.
Regarding the above mentioned thresholds r and u, in practical applications, it may be determined empirically by the grid operator.
(2) Evaluation of accuracy of metrology data
The metering data consists of data obtained by measuring each sampling point, wherein the numerical value of part of sampling points possibly deviates from a true value due to the fault of the metering device, the interference of signal acquisition or transmission and the like, and the problem of inaccurate data occurs. Thus, the accuracy of the metrology data may be measured by comparing inaccurate data to the number of all sampled data. Therefore, the invention defines the measurement accuracy index mu to measure the accuracy of daily acquisition measurement data, and the index definition is as follows:
Where N b represents the number of inaccurate data in the daily load measurement data, and N represents the number of all sampling points in the daily load.
(3) Novel clustering effectiveness evaluation index
The clustering algorithm selects Euclidean distance or Marsdian distance when calculating the distance between the sample and the clustering center, and then evaluates the effectiveness of the clustering by using the same distance calculation method. Thereby bringing about a problem of applicability of the clustering algorithm. If the selected distance calculation method is not applicable to the clustered objects, the obtained clustering validity evaluation result is also not credible. The invention provides a clustering effect evaluation index based on a correlation coefficient, which evaluates the clustering result from the statistical perspective, can reduce the one-sided influence of the clustering algorithm evaluation and enables the evaluation result to be more reasonable.
The traditional k-means clustering algorithm needs to manually specify the clustering number k when clustering, and the value of k directly relates to the clustering effect. How to determine the optimal number of clusters k has been an important content of the research of clustering algorithms. The index for judging the effectiveness of the clusters provided by the invention is presented as different calculation results when the number of the clusters is different from that of the clusters. When the value of C c is maximum, the clustering effect is the best. Therefore, the invention provides an improved k-means clustering algorithm, the traditional k-means clustering algorithm is combined with the calculation of the effectiveness index C c, and the optimal clustering number can be determined through iteration, so that the determination of the clustering number is more scientific and objective.
The invention searches inaccurate data points in the daily load curve by measuring the similarity in the class and the smoothness of the daily load curve. Compared with the situation that inaccurate data points in the load curve can be misjudged to be inaccurate data only through similarity or self smoothness, the method and the device combine the two, and can better reduce misjudgment rate.
Example two
It is an object of the present embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the method described above when executing the program.
Example III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
Example IV
An object of the present embodiment is to provide a power metering failure data identifying system, including:
A power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing; the main hardware equipment of the part comprises:
Various electric energy meter meters installed on the user side: the method directly realizes the record of dividing the power consumption and the load of a user every hour or a specified period;
And (3) an electric power metering terminal: the meter is used for collecting and uploading metering data recorded by the meter and receiving a control command of an upper management end;
Transmission network: the network for realizing the metering acquisition data transmission comprises an optical private network, a wireless private network and the like;
and (3) a data server: storage and analysis of historical metering data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
The main hardware equipment of the part comprises:
the application server: for cluster module program storage and execution.
A bad data determination module configured to: judging whether the data to be tested has the similarity among classes, if so, judging whether the data to be tested has smoothness continuously if not, if so, judging that the data to be tested is accurate, otherwise, judging that the data to be tested is inaccurate, namely bad data.
The main hardware equipment of the part comprises:
the application server: program storage and execution are performed for the bad data determination module.
The steps involved in the devices of the second, third and fourth embodiments correspond to those of the first embodiment of the method, and the detailed description of the embodiments can be found in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present disclosure.
While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims (6)

1. The utility model provides a power metering bad data identification method which is characterized in that the method comprises the following steps:
Obtaining original electric power metering data and preprocessing;
Clustering the preprocessed electric power metering data; the clustering effectiveness index C c is combined with a k-means clustering algorithm when the preprocessed electric power metering data are clustered; the cluster effectiveness index C c is composed of a intra-class correlation coefficient and an inter-class correlation coefficient, and can reflect the similarity between the similarity and the inter-class similarity at the same time, and is specifically defined as:
Wherein n is the total number of samples, kc represents the number of samples contained in the c-th cluster, α ci represents the inter-class correlation coefficient between the c-th load curve and the typical load curve of the clustering result in the class, and β cj represents the inter-class correlation coefficient between the typical load curve in the c-th cluster and the typical load curve of the j-th cluster;
When the number k of clusters takes different values, the indexes C c are different, and when the value C c is maximum, the clustering effect is better, so that the optimal number of clusters can be obtained;
Judging whether the clustering result of the data to be tested and the user to which the data belong has the similarity among classes, if so, judging the data to be accurate data, if not, judging the data to be suspicious data, continuously judging whether the data to be tested has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data;
Specifically, when judging whether the data to be tested has the similarity between classes, defining a similarity index delta (i) between the classes:
delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP c (i) is the ith data on the load curve to be tested, LP d (i) is the ith data on the typical load curve to which the load belongs, a threshold value r is set, the data is considered to be accurate data when delta (i) is E < -r > r, r >, otherwise, when delta (i) is E < -r >, the data is accurate data When the data is identified as suspicious data;
Further screening suspicious data by using the characteristic of smoothness, wherein smoothness indexes are measured by comparing data of the front point and the rear point of the suspicious data;
Assuming that the ith point on the load curve LP c is deemed to be suspicious data, a smoothness metric ε (i) is defined:
Epsilon (i) represents the smoothness of the ith point data on the load curve to be tested; setting a threshold u, considering that epsilon (i) epsilon [ -u, u ] is considered as accurate data when the data is epsilon [ -u, u ], otherwise, when The suspicious data is identified as inaccurate data;
Further comprises: a step of evaluating accuracy of the measurement data, which is to measure accuracy of the measurement data by comparing the number of inaccurate data and all the sampling data;
specifically, a measurement accuracy index μ is defined to measure the accuracy of daily collected measurement data, and the index is defined as follows:
Where N b represents the number of inaccurate data in the daily load measurement data, and N represents the number of all sampling points in the daily load.
2. The method for identifying poor power metering data according to claim 1, wherein the clustering effectiveness index C c is combined with a k-means clustering algorithm when the preprocessed power metering data is clustered, specifically:
Determining an initial cluster number k value;
K samples are selected from n samples to serve as initial clustering centers;
calculating the distance between each sample and the clustering center;
Repartitioning the samples according to the principle of minimum distance, i.e. error square sum;
Calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the iterative clustering is smaller than a threshold value, ending the iteration;
calculating a cluster effectiveness index C c;
And selecting different k values to perform the steps, calculating a cluster effectiveness index, and selecting the cluster number k with the maximum effectiveness index value from the cluster number k, wherein the cluster number and the cluster result are optimal.
3. The method for identifying poor power metering data according to claim 1, wherein the threshold r is determined empirically;
The threshold u is determined empirically.
4. An electric power metering defect data identification system, characterized by comprising:
A power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing;
A power metering data clustering module configured to: clustering the preprocessed electric power metering data; the clustering effectiveness index C c is combined with a k-means clustering algorithm when the preprocessed electric power metering data are clustered; the cluster effectiveness index C c is composed of a intra-class correlation coefficient and an inter-class correlation coefficient, and can reflect the similarity between the similarity and the inter-class similarity at the same time, and is specifically defined as:
Wherein n is the total number of samples, kc represents the number of samples contained in the c-th cluster, α ci represents the inter-class correlation coefficient between the c-th load curve and the typical load curve of the clustering result in the class, and β cj represents the inter-class correlation coefficient between the typical load curve in the c-th cluster and the typical load curve of the j-th cluster;
When the number k of clusters takes different values, the indexes C c are different, and when the value C c is maximum, the clustering effect is better, so that the optimal number of clusters can be obtained;
A bad data determination module configured to: judging whether the data to be tested has the similarity among classes, if so, judging whether the data to be tested has smoothness continuously if not, if so, judging that the data to be tested is accurate, otherwise, judging that the data to be tested is inaccurate, namely bad data;
when judging whether the data to be tested has the similarity among the classes, defining a similarity index delta (i) among the classes:
delta (i) represents the similarity between classes of the ith data on the load curve to be tested, LP c (i) is the ith data on the load curve to be tested, LP d (i) is the ith data on the typical load curve to which the load belongs, a threshold value r is set, the data is considered to be accurate data when delta (i) is E < -r > r, r >, otherwise, when delta (i) is E < -r >, the data is accurate data When the data is identified as suspicious data;
Further screening suspicious data by using the characteristic of smoothness, wherein smoothness indexes are measured by comparing data of the front point and the rear point of the suspicious data;
Assuming that the ith point on the load curve LP c is deemed to be suspicious data, a smoothness metric ε (i) is defined:
Epsilon (i) represents the smoothness of the ith point data on the load curve to be tested; setting a threshold u, considering that epsilon (i) epsilon [ -u, u ] is considered as accurate data when the data is epsilon [ -u, u ], otherwise, when The suspicious data is identified as inaccurate data;
Further comprises: a step of evaluating accuracy of the measurement data, which is to measure accuracy of the measurement data by comparing the number of inaccurate data and all the sampling data;
specifically, a measurement accuracy index μ is defined to measure the accuracy of daily collected measurement data, and the index is defined as follows:
Where N b represents the number of inaccurate data in the daily load measurement data, and N represents the number of all sampling points in the daily load.
5. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of the preceding claims 1-3 when the program is executed.
6. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims 1-3.
CN202110741482.6A 2021-06-30 2021-06-30 Power metering bad data identification method and system Active CN113673551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110741482.6A CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110741482.6A CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Publications (2)

Publication Number Publication Date
CN113673551A CN113673551A (en) 2021-11-19
CN113673551B true CN113673551B (en) 2024-05-28

Family

ID=78538543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110741482.6A Active CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Country Status (1)

Country Link
CN (1) CN113673551B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169604A (en) * 2021-12-06 2022-03-11 北京达佳互联信息技术有限公司 Performance index abnormality detection method, abnormality detection device, electronic apparatus, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055918A (en) * 2016-07-26 2016-10-26 天津大学 Power system load data identification and recovery method
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN110544047A (en) * 2019-09-10 2019-12-06 东北电力大学 Bad data identification method
CN110796173A (en) * 2019-09-27 2020-02-14 昆明电力交易中心有限责任公司 Load curve form clustering algorithm based on improved kmeans
WO2021073462A1 (en) * 2019-10-15 2021-04-22 国网浙江省电力有限公司台州供电公司 10 kv static load model parameter identification method based on similar daily load curves

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829561B2 (en) * 2002-03-16 2004-12-07 International Business Machines Corporation Method for determining a quality for a data clustering and data processing system
GB201515615D0 (en) * 2015-09-03 2015-10-21 Functional Technologies Ltd Clustering images based on camera fingerprints
US10319574B2 (en) * 2016-08-22 2019-06-11 Highland Innovations Inc. Categorization data manipulation using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055918A (en) * 2016-07-26 2016-10-26 天津大学 Power system load data identification and recovery method
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN110544047A (en) * 2019-09-10 2019-12-06 东北电力大学 Bad data identification method
CN110796173A (en) * 2019-09-27 2020-02-14 昆明电力交易中心有限责任公司 Load curve form clustering algorithm based on improved kmeans
WO2021073462A1 (en) * 2019-10-15 2021-04-22 国网浙江省电力有限公司台州供电公司 10 kv static load model parameter identification method based on similar daily load curves

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
k-means 聚类算法在负荷曲线分类中的应用;刘 莉 et al.;《电力系统保护与控制》;第1页第2栏第2段-第3页第2栏第3段 *
方向相似性聚类方法DSCM;修宇;王士同;吴锡生;胡德文;;计算机研究与发展;20060828(第08期);全文 *
洪水聚类有效性分析;卢正波;侯召成;;南水北调与水利科技;20070425(第02期);全文 *

Also Published As

Publication number Publication date
CN113673551A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN106055918B (en) Method for identifying and correcting load data of power system
CN116502112B (en) New energy power supply test data management method and system
CN109409628A (en) Acquisition terminal production firm evaluation method based on metering big data Clustering Model
CN111949939B (en) Method for evaluating running state of intelligent electric meter based on improved TOPSIS and cluster analysis
CN110264107B (en) Large data technology-based abnormal diagnosis method for line loss rate of transformer area
CN117313016B (en) New energy power transaction spot electricity price difference data processing method
CN111967717A (en) Data quality evaluation method based on information entropy
CN111210170A (en) Environment-friendly management and control monitoring and evaluation method based on 90% electricity distribution characteristic index
CN112287980B (en) Power battery screening method based on typical feature vector
CN113673551B (en) Power metering bad data identification method and system
CN117273489A (en) Photovoltaic state evaluation method and device
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN111797887A (en) Anti-electricity-stealing early warning method and system based on density screening and K-means clustering
CN117805649A (en) Method for identifying abnormal battery cells based on SOH quantized battery capacity attenuation
CN114331238B (en) Intelligent model algorithm optimization method, system, storage medium and computer equipment
CN112305441A (en) Power battery health state assessment method under integrated clustering
CN117786445B (en) Intelligent processing method for operation data of automatic yarn reeling machine
CN111027841A (en) Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
CN110781959A (en) Power customer clustering method based on BIRCH algorithm and random forest algorithm
CN114266457A (en) Method for detecting different loss inducement of distribution line
CN111553434A (en) Power system load classification method and system
CN116910655A (en) Intelligent ammeter fault prediction method based on device measurement data
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
CN118503894B (en) Lithium battery quality detection system based on process index data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant