CN113673551A - Method and system for identifying bad data of electric power metering - Google Patents
Method and system for identifying bad data of electric power metering Download PDFInfo
- Publication number
- CN113673551A CN113673551A CN202110741482.6A CN202110741482A CN113673551A CN 113673551 A CN113673551 A CN 113673551A CN 202110741482 A CN202110741482 A CN 202110741482A CN 113673551 A CN113673551 A CN 113673551A
- Authority
- CN
- China
- Prior art keywords
- data
- clustering
- power metering
- judging
- electric power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000003064 k means clustering Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000005259 measurement Methods 0.000 abstract description 9
- 238000013139 quantization Methods 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000010988 intraclass correlation coefficient Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The utility model provides a bad data identification method and system for electric power measurement, comprising: obtaining original electric power metering data and preprocessing the data; clustering the preprocessed electric power metering data; and judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, continuously judging whether the clustered data has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data, namely bad data. One of the quality characteristics of the power metering data, namely the accuracy, is quantized and expressed more intuitively by providing an accuracy quantization index.
Description
Technical Field
The disclosure belongs to the technical field of electric power metering data identification, and particularly relates to a method and a system for identifying bad data of electric power metering.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, the power market enters a rapid development stage as an important means for realizing optimal configuration of power resources, and power metering also becomes a very important basic link in the development process of the power market. The electric power metering data contains rich information and has important significance for the developing electric power market. Through the processing and analysis of the power metering data, more information of the power utilization mode of the user can be obtained, so that a better fitting and alternative scheme can be found when the power metering data is lost, and more valuable references are provided for the recommendation of the retail package of the user when the retail market is developed later.
At present, the power metering technology is gradually developing towards automation and intellectualization. The quality of the electric power metering data is greatly improved compared with the manual metering era. However, as the demand for electric power in production and life of China is gradually increased, in the actual operation process, the phenomenon of unstable quality of electric power metering data still exists due to metering faults of electric power meters, interference in the data acquisition and transmission process and the like. In the aspect of measuring the quality of the power measurement data, the integrity, timeliness and accuracy of the data are main measurement indexes. The integrity and timeliness of the method are evaluated by a complete method, but the accuracy is used as the most important measurement index, and the evaluation method is not mature.
The application of a "Liuli, Wanggang, Dian-Jian. k-means clustering algorithm [ J ] in load curve classification electric power system protection and control, 2011,39(23):65-68+ 73", "Liu Hui boat, Zhou Kao le, Hu Xiao Jian ] poor load data identification and correction [ J ] Chinese electric power based on fuzzy load clustering, 2013,46(10): 29-34" and other documents provide a poor data identification method, wherein the former determines inaccurate data through transverse similarity or longitudinal smoothness; the latter determines the allowable variation range of the load values of various load curves through experience, and determines inaccurate data when the allowable variation range is exceeded. Both methods give out methods for judging bad data, but the proposed inaccurate data judgment method is single, and the situation of misjudgment may exist in practical application. The method combines the characteristics of transverse similarity and longitudinal smoothness of the load curve, is more comprehensively considered compared with the former two characteristics, and can effectively reduce the misjudgment rate. The method is suitable for searching inaccurate data generated by sudden change of individual data in the metering data acquisition due to electromagnetic interference and the like. The method mainly aims at the obtained electric power metering data, and identifies inaccurate data by analyzing a daily load curve composed of active power in the electric power metering data.
In summary, the technical problems related to the acquisition of poor data of power metering in the prior art are as follows: in the prior art, a bad data identification method mostly needs to combine some data information except active power, for example, combine a system line structure for obtaining metering data to perform judgment and the like. Different from the previous methods, the method mainly focuses on the electricity utilization rule and habit of users, the load curve obtained through the electric power metering equipment is independent of the equipment and the mode for obtaining data and the line structure of the system, the method is faster and more convenient, and inaccurate data identification can be carried out on all conditions capable of obtaining the load curve.
Disclosure of Invention
In order to overcome the defects of the prior art, the present disclosure provides a method for identifying bad data of electric power metering, wherein the found inaccurate data points can well reduce the misjudgment rate.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a method for identifying bad data of power metering is disclosed, which includes:
obtaining original electric power metering data and preprocessing the data;
clustering the preprocessed electric power metering data;
judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, judging the data to be suspicious, and continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data.
According to the further technical scheme, the clustering effectiveness index C is obtained when the preprocessed electric power metering data are clusteredcThe method is combined with a k-means clustering algorithm, and specifically comprises the following steps:
determining an initial clustering number k value;
selecting k samples from the n samples as initial clustering centers;
calculating the distance between each sample and the clustering center;
dividing the sample again according to the principle of minimum distance, namely the least sum of squared errors;
calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, the iteration is finished;
calculating a clustering validity index Cc;
And selecting different k values to carry out the steps, calculating the clustering effectiveness index, and selecting the clustering number k with the maximum effectiveness index value from the clustering effectiveness index, wherein the clustering number and the clustering result are optimal.
According to the further technical scheme, when judging whether the data to be detected has inter-class similarity, defining an inter-class similarity index delta (i):
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) Setting a threshold value r for the ith data on a typical load curve of the class to which the load belongs, and considering that when delta (i) is equal to [ -r, r]When the data belongs to the exact data, otherwise, whenThe data is considered suspect data.
According to the further technical scheme, the characteristic of smoothness is used for further screening the suspicious data, and the smoothness index can be measured by comparing the data of two points before and after the suspicious data.
Further technical solution, assume load curve LPcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is regarded as accurate data when the data is read, and vice versaThe suspect data is identified as inaccurate data.
In a further embodiment, the thresholds r and u may be determined based on operational experience.
The further technical scheme also comprises the following steps: the method comprises the following steps of evaluating the accuracy of the metering data:
the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data.
In a second aspect, a system for identifying bad data in power metering is disclosed, which comprises:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
a bad data determination module configured to: and judging whether the cluster result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
The above one or more technical solutions have the following beneficial effects:
the invention introduces the correlation coefficient into the clustering effectiveness evaluation, represents the distance between samples by the correlation coefficient, and defines the clustering effectiveness index Cc. Compared with the conventional commonly used Xie-Beni index which uses Euclidean distance to calculate the intra-cluster distance or the inter-cluster distance, the effectiveness index CcOn the other hand, the effectiveness of the clustering algorithm is measured.
The index for judging the clustering effectiveness provided by the invention is represented as different calculation results when the clustering samples are different from the clustering number. When C is presentcAnd when the value is maximum, the clustering effect is best. Therefore, the invention combines the traditional k-means clustering algorithm with the effectiveness index CcThe optimal cluster number can be determined by iteration in combination with the calculation of (2).
The invention provides an electric power metering bad data identification method based on similar internal similarity and self smoothness. By combining the intra-class similarity judgment of the load curve to be detected and the typical curve with the smoothness judgment of the load curve to be detected, the misjudgment caused by single judgment standard can be reduced to a certain extent. One of the quality characteristics of the power metering data, namely the accuracy, is quantized and expressed more intuitively by providing an accuracy quantization index.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flow chart of an improved k-means clustering algorithm according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a bad data identification method based on intra-class similarity and self-smoothness according to an embodiment of the disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment discloses a method for identifying bad data of electric power metering, which firstly introduces a clustering effect evaluation index based on a correlation coefficient:
the clustering effectiveness has various evaluation indexes, and the quantitative effectiveness judgment is carried out by judging the intra-class distance and the inter-class distance of the clustering result fundamentally. A good clustering result should be achieved with the largest possible distance between classes and the smallest possible distance of samples to their cluster centers. The invention introduces the correlation coefficient into the clustering effectiveness evaluation, and the correlation coefficient is used for representing the distance between samples, thereby constructing a new clustering effectiveness evaluation index Cc。CcThe intra-class correlation coefficient and the inter-class correlation coefficient are used for reflecting the intra-class similarity and the inter-class similarity simultaneously. The index definition process is as follows:
first, the intra-class correlation coefficient is defined as:
wherein alpha isciClass-i correlation coefficient, x, representing class-c ith load curvecbRepresents the b-th data point on the c-th clustering result typical load curve,means, x, representing points on a class c typical load curveibRepresents the b-th data point on the ith curve in the c-th class,and (3) representing the mean value of the ith curve in the c type, wherein m is the number of data points on each load curve.
The inter-class correlation coefficient is defined as:
wherein, betacjRepresenting the inter-class correlation coefficient, x, of the typical load curve in the c-th cluster and the typical load curve in the j-th clustercbRepresenting the b-th data point on the typical load curve in the c-th cluster,data mean, x, representing a class c typical load curvejbThen the b-th data point on the j-th class typical load curve is represented,the mean of the data on the class j typical load curve is shown.
Defining a clustering validity index Cc:
Where n is the total number of samples and kc represents the number of samples contained in the c-th cluster. Since the correlation coefficient is a number smaller than 1, the closer to 1 indicates the stronger correlation between the two,is the sum of the intra-class correlation coefficients, max (β)cj) Is the maximum value of the correlation coefficient between classes, and the clustering effectiveness index C is obtained by dividing the upper part and the lower partc. When the number k of clusters takes different values, the index CcWhen the size of C is differentcWhen the value is maximum, the representative clustering effect is better, so that the optimal clustering number can be obtained.
Improved k-means clustering algorithm
Data normalization
In order to make the load curves with large numerical difference comparable during clustering, the data needs to be normalized first. Because the metering data points of each hour are directly selected as features during clustering, and the condition that dimensions are not uniform does not exist, an extreme linear normalization formula is selected, wherein the formula is as follows:
wherein LP (i) represents the raw data of the ith point on the daily load curve,normalized data representing the ith point on the daily load curve.
The clustering effectiveness index C defined in the foregoingcAnd the improved k-means clustering algorithm which enables the k value to be determined more conveniently is established by combining with the k-means clustering algorithm. Referring to the attached figure 1, the basic implementation steps are as follows:
(1) empirically determining a value of k;
(2) selecting k samples from the n samples as initial cluster centers: c0, C1, C k-1;
(3) calculating the distance between each sample and the clustering center; the sample refers to a daily load curve of the power consumer, and the curve which is composed of one point per hour (24 points in the whole day) or one point per 15 minutes (96 points in the whole day) reflects the change trend of the power consumption of the consumer along with the time. Can be obtained by an electric power metering device.
(4) The samples are divided again according to the principle of minimum distance (namely the sum of squared errors);
(5) the mean of each class of samples is calculated as the new cluster center,
(6) judging that the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, finishing the iteration, and otherwise, repeating the steps (3), (4) and (5) until the conditions are met;
(7) calculating a clustering validity index Cc;
(8) And (1) returning, selecting different k values to carry out the steps, and calculating a clustering effectiveness index CcSelecting C therefromcAnd (4) considering the maximum clustering number k of the index values to be the optimal clustering number and clustering result.
In an embodiment, referring to fig. 2, the method for identifying poor power metering data based on the similarity and the smoothness includes:
(1) bad data discrimination
The same user is in production and lifeOn the same type of working day or holiday, the power consumption follows a certain law, i.e. the shapes of the load curves are similar on different dates. The change of the load curve in one day is regular, although the electric load is suddenly started, the change is limited compared with the adjacent time, namely the load curve has certain smoothness. Based on this, when the inaccurate data is discriminated, the feature of similarity is used first. And searching inaccurate data by inspecting the transverse similarity of the data in a ratio mode. Suppose LPdFor a typical daily load curve of some kind, LPcA certain daily load curve to be detected. Defining an inter-class similarity index δ (i):
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) The data of the ith point on the typical load curve of the class to which the load belongs. Setting a threshold r, considering as delta (i) ∈ r, r]When the data belongs to the exact data, otherwise, whenThe data is considered suspect data.
The feature of smoothness may then be applied to further screen the suspect data. The smoothness index can be measured by comparing the indexes of the front point and the rear point of the suspicious data, and the load curve LP is assumedcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is considered to be accurate data, and vice versa,when in useThe suspect data is identified as inaccurate data.
In practical applications, the threshold values r and u may be determined empirically by grid operators.
(2) Assessment of accuracy of metrology data
The metering data consists of data measured by each sampling point, wherein the numerical value of part of the sampling points may deviate from the true value due to the faults of the metering device, the interference on signal acquisition or transmission and the like, and the problem of inaccurate data occurs. Therefore, the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data. Therefore, the invention defines a measurement accuracy index mu to measure the accuracy of daily acquisition measurement data, and the index is defined as follows:
in the formula, nbRepresenting the number of inaccurate data in the load measurement data of a day, and N representing the number of all sampling points in the load of the day.
(3) Novel clustering effectiveness evaluation index
And the clustering algorithm selects Euclidean distance or Mahalanobis distance more when calculating the distance between the sample and the clustering center, and then evaluates the clustering effectiveness by using the same distance calculation method. Thereby bringing about a problem of applicability of the clustering algorithm. If the selected distance calculation method is not suitable for the clustering object, the obtained clustering effectiveness evaluation result is not credible. The invention provides a clustering effect evaluation index based on a correlation coefficient, which is used for evaluating a clustering result from the aspect of statistics, and can reduce the one-sided influence of clustering algorithm evaluation, so that the evaluation result is more reasonable.
The traditional k-means clustering algorithm needs to manually appoint the clustering number k when clustering is carried out, and the value of k is directly related to the clustering effect. How to determine the optimal number of clustersk has been an important part of the research of clustering algorithm. The index for judging the clustering effectiveness provided by the invention is represented as different calculation results when the clustering samples are different from the clustering number. When C is presentcAnd when the value is maximum, the clustering effect is best. The invention provides an improved k-means clustering algorithm, which combines the traditional k-means clustering algorithm with an effectiveness index CcThe optimal clustering number can be determined through iteration by combining the calculation, so that the determination of the clustering number is more scientific and objective.
The invention finds the inaccurate data points in the daily load curve of the user by measuring the similarity in the daily load curve and the smoothness of the daily load curve. Compared with the situation that the inaccurate data points in the load curve are searched only through the similarity in the class or the self-smoothness, the situation that the accurate data is misjudged as the inaccurate data can occur, the method and the device can be combined, and the misjudgment rate can be well reduced.
Example two
It is an object of this embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Example four
The present embodiment aims to provide a system for identifying bad data in power metering, which includes:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data; the part of the main hardware equipment comprises:
various electric energy meter meters installed at the user side: the recording of dividing the electricity consumption and the load by the user in each hour or specified time interval is directly realized;
electric power measurement terminal: the meter is used for collecting and uploading the metering data recorded by the meter and receiving a control command of an upper management end;
a transmission network: the network for realizing measurement and acquisition data transmission comprises an optical fiber private network, a wireless private network and the like;
a data server: for storage and analysis of historical metering data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
the part of the main hardware equipment comprises:
an application server: for clustering module program storage and execution.
A bad data determination module configured to: and judging whether the data to be detected has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
The part of the main hardware equipment comprises:
an application server: and the bad data determination module is used for storing and executing the bad data determination module program.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A method for identifying bad data of electric power metering is characterized by comprising the following steps:
obtaining original electric power metering data and preprocessing the data;
clustering the preprocessed electric power metering data;
judging whether the clustering result of the data to be detected and the user to which the data belongs has inter-class similarity, if so, judging the data to be accurate data, if not, judging the data to be suspicious, and continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate data, otherwise, judging the data to be inaccurate data to be bad data.
2. The method as claimed in claim 1, wherein the clustering validity index C is determined when the preprocessed power metering data are clusteredcThe method is combined with a k-means clustering algorithm, and specifically comprises the following steps:
determining an initial clustering number k value;
selecting k samples from the n samples as initial clustering centers;
calculating the distance between each sample and the clustering center;
dividing the samples again according to the principle of minimum distance, namely sum of squares of errors;
calculating the mean value of each type of sample as a new clustering center;
if the sum of the distance changes of the centers of the two iterative clustering is smaller than a threshold value, the iteration is finished;
calculating a clustering validity index Cc;
And selecting different k values to carry out the steps, calculating the clustering effectiveness index, and selecting the clustering number k with the maximum effectiveness index value from the clustering effectiveness index, wherein the clustering number and the clustering result are optimal.
3. The method as claimed in claim 1, wherein when determining whether the data to be measured has inter-class similarity, defining an inter-class similarity index δ (i):
delta (i) represents the inter-class similarity of the ith point data on the load curve to be measured, LPc(i) For the ith data, LP, on the load curve to be measuredd(i) Setting a threshold value r for the ith data on a typical load curve of the class to which the load belongs, and considering that when delta (i) is equal to [ -r, r]When the data belongs to the exact data, otherwise, whenThe data is considered suspect data.
4. The method as claimed in claim 3, wherein the suspected data is further screened by using the smoothness, and the smoothness index can be measured by comparing the data of two points before and after the suspected data.
5. The method as claimed in claim 1, wherein the load curve LP is assumedcThe ith point above is considered suspect data, then the smoothness metric ε (i) is defined:
ε (i) represents the smoothness of the ith data on the load curve to be measured. Similar to the method for measuring similarity index, a threshold u is set, and when epsilon (i) epsilon [ -u, u is considered]The data is regarded as accurate data when the data is read, and vice versaThe suspect data is identified as inaccurate data.
6. The method as claimed in claim 5, wherein the threshold r is determined empirically;
the threshold u is determined empirically.
7. The method as claimed in claim 1, further comprising: the method comprises the following steps of evaluating the accuracy of the metering data:
the accuracy of the metering data can be measured by comparing the inaccurate data with the number of all sampling data.
8. A bad data identification system for electric power metering is characterized by comprising:
a power metering data acquisition module configured to: obtaining original electric power metering data and preprocessing the data;
a power metering data clustering module configured to: clustering the preprocessed electric power metering data;
a bad data determination module configured to: and judging whether the data to be detected has inter-class similarity, if so, judging the data to be accurate, if not, continuously judging whether the data to be detected has smoothness, if so, judging the data to be accurate, otherwise, judging the data to be inaccurate, namely bad data.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of the preceding claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110741482.6A CN113673551B (en) | 2021-06-30 | 2021-06-30 | Power metering bad data identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110741482.6A CN113673551B (en) | 2021-06-30 | 2021-06-30 | Power metering bad data identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113673551A true CN113673551A (en) | 2021-11-19 |
CN113673551B CN113673551B (en) | 2024-05-28 |
Family
ID=78538543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110741482.6A Active CN113673551B (en) | 2021-06-30 | 2021-06-30 | Power metering bad data identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113673551B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114169604A (en) * | 2021-12-06 | 2022-03-11 | 北京达佳互联信息技术有限公司 | Performance index abnormality detection method, abnormality detection device, electronic apparatus, and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182082A1 (en) * | 2002-03-16 | 2003-09-25 | International Business Machines Corporation | Method for determining a quality for a data clustering and data processing system |
CN106055918A (en) * | 2016-07-26 | 2016-10-26 | 天津大学 | Power system load data identification and recovery method |
CN107528722A (en) * | 2017-07-06 | 2017-12-29 | 阿里巴巴集团控股有限公司 | Abnormal point detecting method and device in a kind of time series |
US20180053642A1 (en) * | 2016-08-22 | 2018-02-22 | Eung Joon JO | Categorization data manipulation using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer |
US20180253626A1 (en) * | 2015-09-03 | 2018-09-06 | Functional Technologies Ltd. | Clustering images based on camera fingerprints |
CN109766950A (en) * | 2019-01-18 | 2019-05-17 | 东北大学 | A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM |
CN110544047A (en) * | 2019-09-10 | 2019-12-06 | 东北电力大学 | Bad data identification method |
CN110796173A (en) * | 2019-09-27 | 2020-02-14 | 昆明电力交易中心有限责任公司 | Load curve form clustering algorithm based on improved kmeans |
WO2021073462A1 (en) * | 2019-10-15 | 2021-04-22 | 国网浙江省电力有限公司台州供电公司 | 10 kv static load model parameter identification method based on similar daily load curves |
-
2021
- 2021-06-30 CN CN202110741482.6A patent/CN113673551B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182082A1 (en) * | 2002-03-16 | 2003-09-25 | International Business Machines Corporation | Method for determining a quality for a data clustering and data processing system |
US20180253626A1 (en) * | 2015-09-03 | 2018-09-06 | Functional Technologies Ltd. | Clustering images based on camera fingerprints |
CN106055918A (en) * | 2016-07-26 | 2016-10-26 | 天津大学 | Power system load data identification and recovery method |
US20180053642A1 (en) * | 2016-08-22 | 2018-02-22 | Eung Joon JO | Categorization data manipulation using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer |
CN107528722A (en) * | 2017-07-06 | 2017-12-29 | 阿里巴巴集团控股有限公司 | Abnormal point detecting method and device in a kind of time series |
CN109766950A (en) * | 2019-01-18 | 2019-05-17 | 东北大学 | A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM |
CN110544047A (en) * | 2019-09-10 | 2019-12-06 | 东北电力大学 | Bad data identification method |
CN110796173A (en) * | 2019-09-27 | 2020-02-14 | 昆明电力交易中心有限责任公司 | Load curve form clustering algorithm based on improved kmeans |
WO2021073462A1 (en) * | 2019-10-15 | 2021-04-22 | 国网浙江省电力有限公司台州供电公司 | 10 kv static load model parameter identification method based on similar daily load curves |
Non-Patent Citations (3)
Title |
---|
修宇;王士同;吴锡生;胡德文;: "方向相似性聚类方法DSCM", 计算机研究与发展, no. 08, 28 August 2006 (2006-08-28) * |
刘 莉 ET AL.: "k-means 聚类算法在负荷曲线分类中的应用", 《电力系统保护与控制》, pages 1 * |
卢正波;侯召成;: "洪水聚类有效性分析", 南水北调与水利科技, no. 02, 25 April 2007 (2007-04-25) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114169604A (en) * | 2021-12-06 | 2022-03-11 | 北京达佳互联信息技术有限公司 | Performance index abnormality detection method, abnormality detection device, electronic apparatus, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113673551B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106055918B (en) | Method for identifying and correcting load data of power system | |
CN111324642A (en) | Model algorithm type selection and evaluation method for power grid big data analysis | |
CN110796173B (en) | Load curve morphology clustering algorithm based on improved kmeans | |
CN111626821B (en) | Product recommendation method and system for realizing customer classification based on integrated feature selection | |
CN108415910B (en) | Topic development clustering analysis system and method based on time series | |
CN111079941B (en) | Credit information processing method, credit information processing system, terminal and storage medium | |
CN105243068A (en) | Database system query method, server and energy consumption test system | |
CN110428270A (en) | The potential preference client recognition methods of the channel of logic-based regression algorithm | |
CN112819299A (en) | Differential K-means load clustering method based on center optimization | |
CN111210170A (en) | Environment-friendly management and control monitoring and evaluation method based on 90% electricity distribution characteristic index | |
CN111967717A (en) | Data quality evaluation method based on information entropy | |
CN111340065B (en) | User load electricity stealing model mining system and method based on complex user behavior analysis | |
CN115081795A (en) | Enterprise energy consumption abnormity cause analysis method and system under multidimensional scene | |
CN112305441A (en) | Power battery health state assessment method under integrated clustering | |
CN111949939A (en) | Intelligent electric meter running state evaluation method based on improved TOPSIS and cluster analysis | |
CN116148753A (en) | Intelligent electric energy meter operation error monitoring system | |
CN114331238B (en) | Intelligent model algorithm optimization method, system, storage medium and computer equipment | |
CN113673551A (en) | Method and system for identifying bad data of electric power metering | |
CN116821832A (en) | Abnormal data identification and correction method for high-voltage industrial and commercial user power load | |
CN114266457A (en) | Method for detecting different loss inducement of distribution line | |
CN117786445A (en) | Intelligent processing method for operation data of automatic yarn reeling machine | |
CN109214468A (en) | It is a kind of based on can open up away from optimization cluster centre data clustering method | |
Wu et al. | Optimization and improvement based on K-Means Cluster algorithm | |
Jiang et al. | SRGM decision model considering cost-reliability | |
CN116304295A (en) | User energy consumption portrait analysis method based on multivariate data driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |