CN113673551A - Method and system for identifying bad data of electric power metering - Google Patents

Method and system for identifying bad data of electric power metering Download PDF

Info

Publication number
CN113673551A
CN113673551A CN202110741482.6A CN202110741482A CN113673551A CN 113673551 A CN113673551 A CN 113673551A CN 202110741482 A CN202110741482 A CN 202110741482A CN 113673551 A CN113673551 A CN 113673551A
Authority
CN
China
Prior art keywords
data
clustering
power metering
judging
electric power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110741482.6A
Other languages
Chinese (zh)
Other versions
CN113673551B (en
Inventor
陈祉如
代燕杰
刘轶娟
郭亮
荆臻
杜艳
董贤光
张志�
赵曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Original Assignee
Shandong University
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Marketing Service Center of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, State Grid Shandong Electric Power Co Ltd, Marketing Service Center of State Grid Shandong Electric Power Co Ltd filed Critical Shandong University
Priority to CN202110741482.6A priority Critical patent/CN113673551B/en
Publication of CN113673551A publication Critical patent/CN113673551A/en
Application granted granted Critical
Publication of CN113673551B publication Critical patent/CN113673551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The utility model provides a bad data identification method and system for electric power measurement, comprising: obtaining original electric power metering data and preprocessing the data; clustering the preprocessed electric power metering data; and judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, continuously judging whether the clustered data has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data, namely bad data. One of the quality characteristics of the power metering data, namely the accuracy, is quantized and expressed more intuitively by providing an accuracy quantization index.

Description

Method and system for identifying bad data of electric power metering
Technical Field
The disclosure belongs to the technical field of electric power metering data identification, and particularly relates to a method and a system for identifying bad data of electric power metering.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, the power market enters a rapid development stage as an important means for realizing optimal configuration of power resources, and power metering also becomes a very important basic link in the development process of the power market. The electric power metering data contains rich information and has important significance for the developing electric power market. Through the processing and analysis of the power metering data, more information of the power utilization mode of the user can be obtained, so that a better fitting and alternative scheme can be found when the power metering data is lost, and more valuable references are provided for the recommendation of the retail package of the user when the retail market is developed later.
At present, the power metering technology is gradually developing towards automation and intellectualization. The quality of the electric power metering data is greatly improved compared with the manual metering era. However, as the demand for electric power in production and life of China is gradually increased, in the actual operation process, the phenomenon of unstable quality of electric power metering data still exists due to metering faults of electric power meters, interference in the data acquisition and transmission process and the like. In the aspect of measuring the quality of the power measurement data, the integrity, timeliness and accuracy of the data are main measurement indexes. The integrity and timeliness of the method are evaluated by a complete method, but the accuracy is used as the most important measurement index, and the evaluation method is not mature.
The application of a "Liuli, Wanggang, Dian-Jian. k-means clustering algorithm [ J ] in load curve classification electric power system protection and control, 2011,39(23):65-68+ 73", "Liu Hui boat, Zhou Kao le, Hu Xiao Jian ] poor load data identification and correction [ J ] Chinese electric power based on fuzzy load clustering, 2013,46(10): 29-34" and other documents provide a poor data identification method, wherein the former determines inaccurate data through transverse similarity or longitudinal smoothness; the latter determines the allowable variation range of the load values of various load curves through experience, and determines inaccurate data when the allowable variation range is exceeded. Both methods give out methods for judging bad data, but the proposed inaccurate data judgment method is single, and the situation of misjudgment may exist in practical application. The method combines the characteristics of transverse similarity and longitudinal smoothness of the load curve, is more comprehensively considered compared with the former two characteristics, and can effectively reduce the misjudgment rate. The method is suitable for searching inaccurate data generated by sudden change of individual data in the metering data acquisition due to electromagnetic interference and the like. The method mainly aims at the obtained electric power metering data, and identifies inaccurate data by analyzing a daily load curve composed of active power in the electric power metering data.
In summary, the technical problems related to the acquisition of poor data of power metering in the prior art are as follows: in the prior art, a bad data identification method mostly needs to combine some data information except active power, for example, combine a system line structure for obtaining metering data to perform judgment and the like. Different from the previous methods, the method mainly focuses on the electricity utilization rule and habit of users, the load curve obtained through the electric power metering equipment is independent of the equipment and the mode for obtaining data and the line structure of the system, the method is faster and more convenient, and inaccurate data identification can be carried out on all conditions capable of obtaining the load curve.
Disclosure of Invention
In order to overcome the defects of the prior art, the present disclosure provides a method for identifying bad data of electric power metering, wherein the found inaccurate data points can well reduce the misjudgment rate.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a method for identifying bad data of power metering is disclosed, which includes:
obtaining original electric power metering data and preprocessing the data;
clustering the preprocessed electric power metering data;
judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, judging the data to be suspicious, and continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data.
According to the further technical scheme, the clustering effectiveness index C is obtained when the preprocessed electric power metering data are clusteredcThe method is combined with a k-means clustering algorithm, and specifically comprises the following steps:
determining an initial clustering number k value;
selecting k samples from the n samples as initial clustering centers;
calculating the distance between each sample and the clustering center;
dividing the sample again according to the principle of minimum distance, namely the least sum of squared errors;
calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, the iteration is finished;
calculating a clustering validity index Cc
And selecting different k values to carry out the steps, calculating the clustering effectiveness index, and selecting the clustering number k with the maximum effectiveness index value from the clustering effectiveness index, wherein the clustering number and the clustering result are optimal.
According to the further technical scheme, when judging whether the data to be detected has inter-class similarity, defining an inter-class similarity index delta (i):
Figure BDA0003141538290000031
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) Setting a threshold value r for the ith data on a typical load curve of the class to which the load belongs, and considering that when delta (i) is equal to [ -r, r]When the data belongs to the exact data, otherwise, when
Figure BDA0003141538290000032
The data is considered suspect data.
According to the further technical scheme, the characteristic of smoothness is used for further screening the suspicious data, and the smoothness index can be measured by comparing the data of two points before and after the suspicious data.
Further technical solution, assume load curve LPcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
Figure BDA0003141538290000033
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is regarded as accurate data when the data is read, and vice versa
Figure BDA0003141538290000041
The suspect data is identified as inaccurate data.
In a further embodiment, the thresholds r and u may be determined based on operational experience.
The further technical scheme also comprises the following steps: the method comprises the following steps of evaluating the accuracy of the metering data:
the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data.
In a second aspect, a system for identifying bad data in power metering is disclosed, which comprises:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
a bad data determination module configured to: and judging whether the cluster result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
The above one or more technical solutions have the following beneficial effects:
the invention introduces the correlation coefficient into the clustering effectiveness evaluation, represents the distance between samples by the correlation coefficient, and defines the clustering effectiveness index Cc. Compared with the conventional commonly used Xie-Beni index which uses Euclidean distance to calculate the intra-cluster distance or the inter-cluster distance, the effectiveness index CcOn the other hand, the effectiveness of the clustering algorithm is measured.
The index for judging the clustering effectiveness provided by the invention is represented as different calculation results when the clustering samples are different from the clustering number. When C is presentcAnd when the value is maximum, the clustering effect is best. Therefore, the invention combines the traditional k-means clustering algorithm with the effectiveness index CcThe optimal cluster number can be determined by iteration in combination with the calculation of (2).
The invention provides an electric power metering bad data identification method based on similar internal similarity and self smoothness. By combining the intra-class similarity judgment of the load curve to be detected and the typical curve with the smoothness judgment of the load curve to be detected, the misjudgment caused by single judgment standard can be reduced to a certain extent. One of the quality characteristics of the power metering data, namely the accuracy, is quantized and expressed more intuitively by providing an accuracy quantization index.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flow chart of an improved k-means clustering algorithm according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a bad data identification method based on intra-class similarity and self-smoothness according to an embodiment of the disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment discloses a method for identifying bad data of electric power metering, which firstly introduces a clustering effect evaluation index based on a correlation coefficient:
the clustering effectiveness has various evaluation indexes, and the quantitative effectiveness judgment is carried out by judging the intra-class distance and the inter-class distance of the clustering result fundamentally. A good clustering result should be achieved with the largest possible distance between classes and the smallest possible distance of samples to their cluster centers. The invention introduces the correlation coefficient into the clustering effectiveness evaluation, and the correlation coefficient is used for representing the distance between samples, thereby constructing a new clustering effectiveness evaluation index Cc。CcThe intra-class correlation coefficient and the inter-class correlation coefficient are used for reflecting the intra-class similarity and the inter-class similarity simultaneously. The index definition process is as follows:
first, the intra-class correlation coefficient is defined as:
Figure BDA0003141538290000061
wherein alpha isciClass-i correlation coefficient, x, representing class-c ith load curvecbRepresents the b-th data point on the c-th clustering result typical load curve,
Figure BDA0003141538290000062
means, x, representing points on a class c typical load curveibRepresents the b-th data point on the ith curve in the c-th class,
Figure BDA0003141538290000063
and (3) representing the mean value of the ith curve in the c type, wherein m is the number of data points on each load curve.
The inter-class correlation coefficient is defined as:
Figure BDA0003141538290000064
wherein, betacjRepresenting the inter-class correlation coefficient, x, of the typical load curve in the c-th cluster and the typical load curve in the j-th clustercbRepresenting the b-th data point on the typical load curve in the c-th cluster,
Figure BDA0003141538290000065
data mean, x, representing a class c typical load curvejbThen the b-th data point on the j-th class typical load curve is represented,
Figure BDA0003141538290000066
the mean of the data on the class j typical load curve is shown.
Defining a clustering validity index Cc
Figure BDA0003141538290000067
Where n is the total number of samples and kc represents the number of samples contained in the c-th cluster. Since the correlation coefficient is a number smaller than 1, the closer to 1 indicates the stronger correlation between the two,
Figure BDA0003141538290000068
is the sum of the intra-class correlation coefficients, max (β)cj) Is the maximum value of the correlation coefficient between classes, and the clustering effectiveness index C is obtained by dividing the upper part and the lower partc. When the number k of clusters takes different values, the index CcWhen the size of C is differentcWhen the value is maximum, the representative clustering effect is better, so that the optimal clustering number can be obtained.
Improved k-means clustering algorithm
Data normalization
In order to make the load curves with large numerical difference comparable during clustering, the data needs to be normalized first. Because the metering data points of each hour are directly selected as features during clustering, and the condition that dimensions are not uniform does not exist, an extreme linear normalization formula is selected, wherein the formula is as follows:
Figure BDA0003141538290000071
wherein LP (i) represents the raw data of the ith point on the daily load curve,
Figure BDA0003141538290000072
normalized data representing the ith point on the daily load curve.
The clustering effectiveness index C defined in the foregoingcAnd the improved k-means clustering algorithm which enables the k value to be determined more conveniently is established by combining with the k-means clustering algorithm. Referring to the attached figure 1, the basic implementation steps are as follows:
(1) empirically determining a value of k;
(2) selecting k samples from the n samples as initial cluster centers: c0, C1, C k-1;
(3) calculating the distance between each sample and the clustering center; the sample refers to a daily load curve of the power consumer, and the curve which is composed of one point per hour (24 points in the whole day) or one point per 15 minutes (96 points in the whole day) reflects the change trend of the power consumption of the consumer along with the time. Can be obtained by an electric power metering device.
(4) The samples are divided again according to the principle of minimum distance (namely the sum of squared errors);
(5) the mean of each class of samples is calculated as the new cluster center,
(6) judging that the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, finishing the iteration, and otherwise, repeating the steps (3), (4) and (5) until the conditions are met;
(7) calculating a clustering validity index Cc
(8) And (1) returning, selecting different k values to carry out the steps, and calculating a clustering effectiveness index CcSelecting C therefromcAnd (4) considering the maximum clustering number k of the index values to be the optimal clustering number and clustering result.
In an embodiment, referring to fig. 2, the method for identifying poor power metering data based on the similarity and the smoothness includes:
(1) bad data discrimination
The same user is in production and lifeOn the same type of working day or holiday, the power consumption follows a certain law, i.e. the shapes of the load curves are similar on different dates. The change of the load curve in one day is regular, although the electric load is suddenly started, the change is limited compared with the adjacent time, namely the load curve has certain smoothness. Based on this, when the inaccurate data is discriminated, the feature of similarity is used first. And searching inaccurate data by inspecting the transverse similarity of the data in a ratio mode. Suppose LPdFor a typical daily load curve of some kind, LPcA certain daily load curve to be detected. Defining an inter-class similarity index δ (i):
Figure BDA0003141538290000081
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) The data of the ith point on the typical load curve of the class to which the load belongs. Setting a threshold r, considering as delta (i) ∈ r, r]When the data belongs to the exact data, otherwise, when
Figure BDA0003141538290000084
The data is considered suspect data.
The feature of smoothness may then be applied to further screen the suspect data. The smoothness index can be measured by comparing the indexes of the front point and the rear point of the suspicious data, and the load curve LP is assumedcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
Figure BDA0003141538290000082
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is considered to be accurate data, and vice versa,when in use
Figure BDA0003141538290000083
The suspect data is identified as inaccurate data.
In practical applications, the threshold values r and u may be determined empirically by grid operators.
(2) Assessment of accuracy of metrology data
The metering data consists of data measured by each sampling point, wherein the numerical value of part of the sampling points may deviate from the true value due to the faults of the metering device, the interference on signal acquisition or transmission and the like, and the problem of inaccurate data occurs. Therefore, the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data. Therefore, the invention defines a measurement accuracy index mu to measure the accuracy of daily acquisition measurement data, and the index is defined as follows:
Figure BDA0003141538290000091
in the formula, nbRepresenting the number of inaccurate data in the load measurement data of a day, and N representing the number of all sampling points in the load of the day.
(3) Novel clustering effectiveness evaluation index
And the clustering algorithm selects Euclidean distance or Mahalanobis distance more when calculating the distance between the sample and the clustering center, and then evaluates the clustering effectiveness by using the same distance calculation method. Thereby bringing about a problem of applicability of the clustering algorithm. If the selected distance calculation method is not suitable for the clustering object, the obtained clustering effectiveness evaluation result is not credible. The invention provides a clustering effect evaluation index based on a correlation coefficient, which is used for evaluating a clustering result from the aspect of statistics, and can reduce the one-sided influence of clustering algorithm evaluation, so that the evaluation result is more reasonable.
The traditional k-means clustering algorithm needs to manually appoint the clustering number k when clustering is carried out, and the value of k is directly related to the clustering effect. How to determine the optimal number of clustersk has been an important part of the research of clustering algorithm. The index for judging the clustering effectiveness provided by the invention is represented as different calculation results when the clustering samples are different from the clustering number. When C is presentcAnd when the value is maximum, the clustering effect is best. The invention provides an improved k-means clustering algorithm, which combines the traditional k-means clustering algorithm with an effectiveness index CcThe optimal clustering number can be determined through iteration by combining the calculation, so that the determination of the clustering number is more scientific and objective.
The invention finds the inaccurate data points in the daily load curve of the user by measuring the similarity in the daily load curve and the smoothness of the daily load curve. Compared with the situation that the inaccurate data points in the load curve are searched only through the similarity in the class or the self-smoothness, the situation that the accurate data is misjudged as the inaccurate data can occur, the method and the device can be combined, and the misjudgment rate can be well reduced.
Example two
It is an object of this embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Example four
The present embodiment aims to provide a system for identifying bad data in power metering, which includes:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data; the part of the main hardware equipment comprises:
various electric energy meter meters installed at the user side: the recording of dividing the electricity consumption and the load by the user in each hour or specified time interval is directly realized;
electric power measurement terminal: the meter is used for collecting and uploading the metering data recorded by the meter and receiving a control command of an upper management end;
a transmission network: the network for realizing measurement and acquisition data transmission comprises an optical fiber private network, a wireless private network and the like;
a data server: for storage and analysis of historical metering data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
the part of the main hardware equipment comprises:
an application server: for clustering module program storage and execution.
A bad data determination module configured to: and judging whether the data to be detected has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
The part of the main hardware equipment comprises:
an application server: and the bad data determination module is used for storing and executing the bad data determination module program.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A method for identifying bad data of electric power metering is characterized by comprising the following steps:
obtaining original electric power metering data and preprocessing the data;
clustering the preprocessed electric power metering data;
judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, judging the data to be suspicious, and continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data.
2. The method as claimed in claim 1, wherein the clustering validity index C is determined when the preprocessed power metering data are clusteredcThe method is combined with a k-means clustering algorithm, and specifically comprises the following steps:
determining an initial clustering number k value;
selecting k samples from the n samples as initial clustering centers;
calculating the distance between each sample and the clustering center;
dividing the samples again according to the principle of minimum distance, namely sum of squares of errors;
calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, the iteration is finished;
calculating a clustering validity index Cc
And selecting different k values to carry out the steps, calculating the clustering effectiveness index, and selecting the clustering number k with the maximum effectiveness index value from the clustering effectiveness index, wherein the clustering number and the clustering result are optimal.
3. The method as claimed in claim 1, wherein when determining whether the data to be measured has inter-class similarity, defining an inter-class similarity index δ (i):
Figure FDA0003141538280000011
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) Setting a threshold value r for the ith data on a typical load curve of the class to which the load belongs, and considering that when delta (i) is equal to [ -r, r]When the data belongs to the exact data, otherwise, when
Figure FDA0003141538280000023
The data is considered suspect data.
4. The method as claimed in claim 3, wherein the suspected data is further screened by using the smoothness, and the smoothness index can be measured by comparing the data of two points before and after the suspected data.
5. The method as claimed in claim 1, wherein the load curve LP is assumedcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
Figure FDA0003141538280000021
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is regarded as accurate data when the data is read, and vice versa
Figure FDA0003141538280000022
The suspect data is identified as inaccurate data.
6. The method as claimed in claim 5, wherein the threshold r is determined empirically;
the threshold u is determined empirically.
7. The method as claimed in claim 1, further comprising: the method comprises the following steps of evaluating the accuracy of the metering data:
the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data.
8. A bad data identification system for electric power metering is characterized by comprising:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
a bad data determination module configured to: and judging whether the data to be detected has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of the preceding claims 1 to 7.
CN202110741482.6A 2021-06-30 2021-06-30 Power metering bad data identification method and system Active CN113673551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110741482.6A CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110741482.6A CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Publications (2)

Publication Number Publication Date
CN113673551A true CN113673551A (en) 2021-11-19
CN113673551B CN113673551B (en) 2024-05-28

Family

ID=78538543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110741482.6A Active CN113673551B (en) 2021-06-30 2021-06-30 Power metering bad data identification method and system

Country Status (1)

Country Link
CN (1) CN113673551B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182082A1 (en) * 2002-03-16 2003-09-25 International Business Machines Corporation Method for determining a quality for a data clustering and data processing system
CN106055918A (en) * 2016-07-26 2016-10-26 天津大学 Power system load data identification and recovery method
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
US20180053642A1 (en) * 2016-08-22 2018-02-22 Eung Joon JO Categorization data manipulation using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer
US20180253626A1 (en) * 2015-09-03 2018-09-06 Functional Technologies Ltd. Clustering images based on camera fingerprints
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN110544047A (en) * 2019-09-10 2019-12-06 东北电力大学 Bad data identification method
CN110796173A (en) * 2019-09-27 2020-02-14 昆明电力交易中心有限责任公司 Load curve form clustering algorithm based on improved kmeans
WO2021073462A1 (en) * 2019-10-15 2021-04-22 国网浙江省电力有限公司台州供电公司 10 kv static load model parameter identification method based on similar daily load curves

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182082A1 (en) * 2002-03-16 2003-09-25 International Business Machines Corporation Method for determining a quality for a data clustering and data processing system
US20180253626A1 (en) * 2015-09-03 2018-09-06 Functional Technologies Ltd. Clustering images based on camera fingerprints
CN106055918A (en) * 2016-07-26 2016-10-26 天津大学 Power system load data identification and recovery method
US20180053642A1 (en) * 2016-08-22 2018-02-22 Eung Joon JO Categorization data manipulation using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN110544047A (en) * 2019-09-10 2019-12-06 东北电力大学 Bad data identification method
CN110796173A (en) * 2019-09-27 2020-02-14 昆明电力交易中心有限责任公司 Load curve form clustering algorithm based on improved kmeans
WO2021073462A1 (en) * 2019-10-15 2021-04-22 国网浙江省电力有限公司台州供电公司 10 kv static load model parameter identification method based on similar daily load curves

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
修宇;王士同;吴锡生;胡德文;: "方向相似性聚类方法DSCM", 计算机研究与发展, no. 08, 28 August 2006 (2006-08-28) *
刘 莉 ET AL.: "k-means 聚类算法在负荷曲线分类中的应用", 《电力系统保护与控制》, pages 1 *
卢正波;侯召成;: "洪水聚类有效性分析", 南水北调与水利科技, no. 02, 25 April 2007 (2007-04-25) *

Also Published As

Publication number Publication date
CN113673551B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN106055918B (en) Method for identifying and correcting load data of power system
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN110796173B (en) Load curve morphology clustering algorithm based on improved kmeans
CN105243068A (en) Database system query method, server and energy consumption test system
CN111291822B (en) Equipment running state judging method based on fuzzy clustering optimal k value selection algorithm
CN111340065B (en) User load electricity stealing model mining system and method based on complex user behavior analysis
CN112819299A (en) Differential K-means load clustering method based on center optimization
CN111210170A (en) Environment-friendly management and control monitoring and evaluation method based on 90% electricity distribution characteristic index
CN111967717A (en) Data quality evaluation method based on information entropy
CN115081795A (en) Enterprise energy consumption abnormity cause analysis method and system under multidimensional scene
CN111949939A (en) Intelligent electric meter running state evaluation method based on improved TOPSIS and cluster analysis
CN113987033A (en) Main transformer online monitoring data group deviation identification and calibration method
CN111046977A (en) Data preprocessing method based on EM algorithm and KNN algorithm
CN114331238B (en) Intelligent model algorithm optimization method, system, storage medium and computer equipment
CN112305441A (en) Power battery health state assessment method under integrated clustering
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN112836920A (en) Coal electric unit energy efficiency state evaluation method and device and coal electric unit system
CN114266457A (en) Method for detecting different loss inducement of distribution line
CN112950048A (en) National higher education system health evaluation based on fuzzy comprehensive evaluation
CN113673551A (en) Method and system for identifying bad data of electric power metering
Liu et al. Unsupervised pool-based active learning for linear regression
CN114205247B (en) Access method and device of power distribution Internet of things, computer equipment and storage medium
Jiang et al. SRGM decision model considering cost-reliability
CN116304295A (en) User energy consumption portrait analysis method based on multivariate data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant