CN112506908A - Electric energy metering data cleaning method and system - Google Patents
Electric energy metering data cleaning method and system Download PDFInfo
- Publication number
- CN112506908A CN112506908A CN202011458258.8A CN202011458258A CN112506908A CN 112506908 A CN112506908 A CN 112506908A CN 202011458258 A CN202011458258 A CN 202011458258A CN 112506908 A CN112506908 A CN 112506908A
- Authority
- CN
- China
- Prior art keywords
- data
- value
- electric energy
- threshold value
- energy metering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000004140 cleaning Methods 0.000 title claims abstract description 20
- 230000002159 abnormal effect Effects 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 4
- 238000005406 washing Methods 0.000 claims 2
- 238000001514 detection method Methods 0.000 abstract description 4
- 230000008859 change Effects 0.000 abstract description 2
- 230000008030 elimination Effects 0.000 abstract 1
- 238000003379 elimination reaction Methods 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Testing Electric Properties And Detecting Electric Faults (AREA)
Abstract
The invention relates to a method and a system for cleaning electric energy metering data, wherein the method comprises the steps of utilizing the similarity between a Bhattacharyya distance evaluation standard value and a measured value, judging that the data are correct and do not need cleaning when a Bhattacharyya coefficient is larger than a threshold value, calculating the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judging that the corresponding data are abnormal data elimination when the slope which is not similar to other data appears. The invention applies the Babbitt coefficient to the electric energy metering data to judge whether the electric energy metering data is abnormal or not, thereby integrally judging whether possible abnormal values occur or not; when abnormal values occur, abnormal data can be captured quickly by comparing whether the change slopes of the groups are approximately equal. The method is very suitable for electric energy metering detection with small data specifications, and can accurately eliminate abnormal data and reduce errors.
Description
Technical Field
The invention relates to a data cleaning method, in particular to an electric energy metering data cleaning method and a data cleaning system.
Background
The electric energy metering is an important technical support for electric power marketing, and whether the electric energy metering equipment is accurate or not directly influences the vital interests of electric power enterprises and vast electric power users. Indexes such as secondary circuit voltage drop and the like are detected at least once every two years according to the technical management regulations of DL/T448-2016 electric energy metering devices, however, abnormal data are generated due to the influence of manual error operation and the like during detection, and if the abnormal data are not cleaned, the detection result and conclusion are influenced.
However, in the prior art, whether a circuit fails or not cannot be quickly and accurately located.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for cleaning electric energy metering data, which can accurately eliminate abnormal data.
The technical scheme of the invention is as follows:
an electric energy metering data cleaning method is carried out as follows:
and evaluating the similarity between the standard value and the measured value by utilizing the Bhattacharyya distance, judging that the data is correct and does not need to be cleaned when the Bhattacharyya coefficient is larger than a threshold value, calculating the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judging that the corresponding data is abnormal data rejection when slopes which are not approximately equal to other data appear.
Further, the Bhattacharyya distance is defined as: in the same domain X, the babbitt distance of two discrete probability distributions p and q is defined as follows:
DB(p,q)=-ln(BC(p,q)) (1)
wherein, BCIs the Bhattacharyya coefficient, and the value range is as follows: BC is more than or equal to 0 and less than or equal to 1; and DBThe value range is as follows: DB is more than or equal to 0 and less than or equal to infinity;
firstly, normalization processing is required, and is carried out according to the formula (3):
wherein v isstaIs a standard value, vtesFor the measured values, V is taken as VstaAnd Vtes,PVIs a normalized value;
respectively substituting each group of data into formula (2) to obtain BCValue when BCIf the value is larger than the threshold value, the data is judged to be correct without cleaning, and BCAnd when the threshold value is smaller than the threshold value, further processing, wherein the processing process is as follows:
first, the slope of each set of data is calculated using equation (4):
K=[k1,k2..ki..kn],kithe slope of the test data of the ith group is shown, and n represents the number of the test groups;
judging whether the k interval satisfies the formula (5)
k1≈..ki≈kn (5)
When k occursiAnd when the k data is not approximately equal to other k data, judging that the abnormal data is removed.
Further, the threshold value is 0.95.
The invention also relates to an electric energy metering data cleaning system, which comprises a data acquisition unit, a processor and a display;
the data acquisition unit acquires electric energy metering data; the processor evaluates the similarity between the standard value and the measured value by utilizing the Bhattacharyya distance, judges that the data is correct and does not need to be cleaned when the Bhattacharyya coefficient is larger than a threshold value, calculates the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judges that the corresponding data is abnormal data rejection when the slope which is not approximately equal to other data occurs;
the display displays the final result.
Further, the processor processing procedure is specifically as follows: the Bhattacharyya distance is defined as: in the same domain X, the babbitt distance of two discrete probability distributions p and q is defined as follows:
DB(p,q)=-ln(BC(p,q)) (1)
wherein, BCIs the Bhattacharyya coefficient, and the value range is as follows: BC is more than or equal to 0 and less than or equal to 1; and DBThe value range is as follows: DB is more than or equal to 0 and less than or equal to infinity;
firstly, normalization processing is required, and is carried out according to the formula (3):
wherein v isstaIs a standard value, vtesFor the measured values, V is taken as VstaAnd Vtes,PVIs a normalized value;
respectively substituting each group of data into formula (2) to obtain BCValue when BCIf the value is larger than the threshold value, the data is judged to be correct without cleaning, and BCAnd when the threshold value is smaller than the threshold value, further processing, wherein the processing process is as follows:
first, the slope of each set of data is calculated using equation (4):
K=[k1,k2..ki..kn],kithe slope of the test data of the ith group is shown, and n represents the number of the test groups;
judging whether the k interval satisfies the formula (5)
k1≈..ki≈kn (5)
When k occursiAnd when the k data is not approximately equal to other k data, judging that the abnormal data is removed.
Further, the threshold value is 0.95.
Compared with the prior art, the invention has the following beneficial effects:
the method applies the Babbitt coefficient to the electric energy measurement data to judge whether the electric energy measurement data is abnormal or not, judges and extracts abnormal data based on the slope similarity, specifically carries out normalization processing by converting the measurement data under different requirements, and calculates the similarity between a standard source and a measured value by utilizing the Bhattacharyya distance, thereby integrally judging whether a possible abnormal value occurs or not; when abnormal values occur, abnormal data can be captured quickly by comparing whether the change slopes of the groups are approximately equal. The method is very suitable for electric energy metering detection with small data specifications, and can accurately eliminate abnormal data and reduce errors.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of examples of the present invention, and not all examples. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
The electric energy metering data cleaning system comprises a data acquisition unit, a processor and a display;
the data acquisition unit acquires electric energy metering data;
the processor evaluates the similarity between the standard value and the measured value by utilizing the Bhattacharyya distance, judges that the data is correct and does not need to be cleaned when the Bhattacharyya coefficient is larger than a threshold value, calculates the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judges that the corresponding data is abnormal data rejection when the slope which is not approximately equal to other data occurs;
the display displays the final result.
The data cleaning method of the embodiment is performed as follows:
first, the similarity of the standard value and the measured value is evaluated using the Bhattacharyya distance, which is defined as: in the same domain X, the babbitt distance of two discrete probability distributions p and q is defined as follows:
DB(p,q)=-ln(BC(p,q)) (1)
wherein, BCIs the Bhattacharyya coefficient, and the value range is as follows: BC is more than or equal to 0 and less than or equal to 1; and DBThe value range is as follows: DB is more than or equal to 0 and less than or equal to infinity.
Firstly, normalization processing is required, and is carried out according to the formula (3):
wherein v isstaIs a standard value, vtesFor the measured values, V is taken as VstaAnd Vtes,PVIs a normalized value. Respectively substituting each group of data into formula (2) to obtain BCValue when BCAbove 0.95, the data is considered correct and no clean is required, and BCWhen the value is smaller than the threshold value, further processing is needed, and the processing process is as follows:
first, the slope of each set of data is calculated using equation (4).
K=[k1,k2..ki..kn],kiThe slope of the test data in the i-th group is shown, and n is the number of test groups.
Judging whether the k interval satisfies the formula (5)
k1≈..ki≈kn (5)
When k occursiAnd if the k data is not approximately equal to other k data, the abnormal data is considered to be removed.
In this embodiment, 5 sets of data are processed, the calculated slopes are 1.02, 1.07, 1.05, 1.09, 1.21, and the difference between 1.21 and other values is the largest, and the value corresponding to the reorganization is determined to be abnormal data and eliminated.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A method for cleaning electric energy metering data is characterized by comprising the following steps: the method comprises the following steps:
and evaluating the similarity between the standard value and the measured value by utilizing the Bhattacharyya distance, judging that the data is correct and does not need to be cleaned when the Bhattacharyya coefficient is larger than a threshold value, calculating the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judging that the corresponding data is abnormal data rejection when slopes which are not approximately equal to other data appear.
2. The electric energy metering data cleaning method according to claim 1, characterized in that: the Bhattacharyya distance is defined as: in the same domain X, the babbitt distance of two discrete probability distributions p and q is defined as follows:
DB(p,q)=-ln(BC(p,q)) (1)
wherein, BCIs the Bhattacharyya coefficient, and the value range is as follows: BC is more than or equal to 0 and less than or equal to 1; and DBThe value range is as follows: DB is more than or equal to 0 and less than or equal to infinity;
firstly, normalization processing is required, and is carried out according to the formula (3):
wherein v isstaIs a standard value, vtesFor the measured values, V is taken as VstaAnd Vtes,PVIs a normalized value;
respectively substituting each group of data into formula (2) to obtain BCValue when BCIf the value is larger than the threshold value, the data is judged to be correct without cleaning, and BCAnd when the threshold value is smaller than the threshold value, further processing, wherein the processing process is as follows:
first, the slope of each set of data is calculated using equation (4):
K=[k1,k2..ki..kn],kithe slope of the test data of the ith group is shown, and n represents the number of the test groups;
judging whether the k interval satisfies the formula (5)
k1≈..ki≈kn (5)
When k occursiAnd when the k data is not approximately equal to other k data, judging that the abnormal data is removed.
3. The electric energy metering data cleaning method according to claim 1, characterized in that: the threshold is 0.95.
4. The utility model provides an electric energy measurement data cleaning system which characterized in that: the system comprises a data acquisition unit, a processor and a display;
the data acquisition unit acquires electric energy metering data;
the processor evaluates the similarity between the standard value and the measured value by utilizing the Bhattacharyya distance, judges that the data is correct and does not need to be cleaned when the Bhattacharyya coefficient is larger than a threshold value, calculates the slope of each group of data when the Bhattacharyya coefficient is smaller than the threshold value, and judges that the corresponding data is abnormal data rejection when the slope which is not approximately equal to other data occurs;
the display displays the final result.
5. The electric energy metering data washing system of claim 4, wherein: the processing procedure of the processor is as follows: the Bhattacharyya distance is defined as: in the same domain X, the babbitt distance of two discrete probability distributions p and q is defined as follows:
DB(p,q)=-ln(BC(p,q)) (1)
wherein, BCIs the Bhattacharyya coefficient, and the value range is as follows: BC is more than or equal to 0 and less than or equal to 1; and DBThe value range is as follows: DB is more than or equal to 0 and less than or equal to infinity;
firstly, normalization processing is required, and is carried out according to the formula (3):
wherein v isstaIs a standard value, vtesFor the measured values, V is taken as VstaAnd Vtes,PVIs a normalized value;
respectively substituting each group of data into formula (2) to obtain BCValue when BCIf the value is larger than the threshold value, the data is judged to be correct without cleaning, and BCAnd when the threshold value is smaller than the threshold value, further processing, wherein the processing process is as follows:
first, the slope of each set of data is calculated using equation (4):
K=[k1,k2..ki..kn],kithe slope of the test data of the ith group is shown, and n represents the number of the test groups;
judging whether the k interval satisfies the formula (5)
k1≈..ki≈kn (5)
When k occursiAnd when the k data is not approximately equal to other k data, judging that the abnormal data is removed.
6. The electric energy metering data washing system of claim 4, wherein: the threshold is 0.95.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011458258.8A CN112506908B (en) | 2020-12-10 | 2020-12-10 | Electric energy metering data cleaning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011458258.8A CN112506908B (en) | 2020-12-10 | 2020-12-10 | Electric energy metering data cleaning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112506908A true CN112506908A (en) | 2021-03-16 |
CN112506908B CN112506908B (en) | 2024-07-02 |
Family
ID=74973716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011458258.8A Active CN112506908B (en) | 2020-12-10 | 2020-12-10 | Electric energy metering data cleaning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112506908B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988695A (en) * | 2021-11-11 | 2022-01-28 | 国网江苏省电力有限公司扬州供电分公司 | Power grid layered line loss analysis method based on semantic model and multi-source data |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192342A1 (en) * | 2006-02-10 | 2007-08-16 | Microsoft Corporation | Primitive operator for similarity joins in data cleaning |
CN103605362A (en) * | 2013-09-11 | 2014-02-26 | 天津工业大学 | Learning and anomaly detection method based on multi-feature motion modes of vehicle traces |
CN106446922A (en) * | 2015-07-31 | 2017-02-22 | 中国科学院大学 | Crowd abnormal behavior analysis method |
CN108667684A (en) * | 2018-03-30 | 2018-10-16 | 桂林电子科技大学 | A kind of data flow anomaly detection method based on partial vector dot product density |
CN109063885A (en) * | 2018-05-29 | 2018-12-21 | 国网天津市电力公司 | A kind of substation's exception metric data prediction technique |
CN109787197A (en) * | 2019-01-15 | 2019-05-21 | 三峡大学 | Method for pilot protection of circuit based on Bhattacharyya distance algorithm |
WO2019239542A1 (en) * | 2018-06-14 | 2019-12-19 | 三菱電機株式会社 | Abnormality sensing apparatus, abnormality sensing method, and abnormality sensing program |
US20200019890A1 (en) * | 2018-07-11 | 2020-01-16 | Palo Alto Research Center Incorporated | System and method for one-class similarity machines for anomaly detection |
CN111241158A (en) * | 2020-01-07 | 2020-06-05 | 清华大学 | Anomaly detection method and device for aircraft telemetry data |
CN111460917A (en) * | 2020-03-13 | 2020-07-28 | 温州大学大数据与信息技术研究院 | Airport abnormal behavior detection system and method based on multi-mode information fusion |
CN111795482A (en) * | 2019-04-03 | 2020-10-20 | 群光电能科技股份有限公司 | Air conditioning box with element efficiency decline early warning function and early warning method thereof |
CN111985383A (en) * | 2020-08-14 | 2020-11-24 | 太原理工大学 | Transient electromagnetic signal noise separation and identification method based on improved variational modal decomposition |
-
2020
- 2020-12-10 CN CN202011458258.8A patent/CN112506908B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192342A1 (en) * | 2006-02-10 | 2007-08-16 | Microsoft Corporation | Primitive operator for similarity joins in data cleaning |
CN103605362A (en) * | 2013-09-11 | 2014-02-26 | 天津工业大学 | Learning and anomaly detection method based on multi-feature motion modes of vehicle traces |
CN106446922A (en) * | 2015-07-31 | 2017-02-22 | 中国科学院大学 | Crowd abnormal behavior analysis method |
CN108667684A (en) * | 2018-03-30 | 2018-10-16 | 桂林电子科技大学 | A kind of data flow anomaly detection method based on partial vector dot product density |
CN109063885A (en) * | 2018-05-29 | 2018-12-21 | 国网天津市电力公司 | A kind of substation's exception metric data prediction technique |
WO2019239542A1 (en) * | 2018-06-14 | 2019-12-19 | 三菱電機株式会社 | Abnormality sensing apparatus, abnormality sensing method, and abnormality sensing program |
US20200019890A1 (en) * | 2018-07-11 | 2020-01-16 | Palo Alto Research Center Incorporated | System and method for one-class similarity machines for anomaly detection |
CN109787197A (en) * | 2019-01-15 | 2019-05-21 | 三峡大学 | Method for pilot protection of circuit based on Bhattacharyya distance algorithm |
CN111795482A (en) * | 2019-04-03 | 2020-10-20 | 群光电能科技股份有限公司 | Air conditioning box with element efficiency decline early warning function and early warning method thereof |
CN111241158A (en) * | 2020-01-07 | 2020-06-05 | 清华大学 | Anomaly detection method and device for aircraft telemetry data |
CN111460917A (en) * | 2020-03-13 | 2020-07-28 | 温州大学大数据与信息技术研究院 | Airport abnormal behavior detection system and method based on multi-mode information fusion |
CN111985383A (en) * | 2020-08-14 | 2020-11-24 | 太原理工大学 | Transient electromagnetic signal noise separation and identification method based on improved variational modal decomposition |
Non-Patent Citations (1)
Title |
---|
唐亚: "一种PMU数据清洗方法", 《电力系统装备》, no. 4, 17 May 2021 (2021-05-17), pages 79 - 80 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988695A (en) * | 2021-11-11 | 2022-01-28 | 国网江苏省电力有限公司扬州供电分公司 | Power grid layered line loss analysis method based on semantic model and multi-source data |
CN113988695B (en) * | 2021-11-11 | 2023-11-28 | 国网江苏省电力有限公司扬州供电分公司 | Semantic model and multi-source data-based power grid layered line loss analysis method |
Also Published As
Publication number | Publication date |
---|---|
CN112506908B (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117093879B (en) | Intelligent operation management method and system for data center | |
CN107370150B (en) | The Power system state estimation Bad data processing method measured based on synchronized phasor | |
CN110580387B (en) | DC protection system mixed Weibull reliability evaluation method based on entropy weight method | |
CN117421610B (en) | Data anomaly analysis method for electric energy meter running state early warning | |
CN116865269B (en) | Wind turbine generator system high harmonic compensation method and system | |
CN111521915B (en) | High-voltage direct-current line corona onset field strength determination method and system | |
CN112418687B (en) | User electricity utilization abnormity identification method and device based on electricity utilization characteristics and storage medium | |
CN105068035B (en) | A kind of voltage transformer error horizontal dynamic detection method and system | |
CN112362987A (en) | Lightning arrester fault diagnosis method based on robust estimation | |
CN112506908B (en) | Electric energy metering data cleaning method and system | |
CN118152836B (en) | Stability evaluation method for operation process of electric energy meter | |
CN118091300A (en) | Patch resistor fault diagnosis method based on data analysis | |
CN106546886B (en) | A kind of cable oscillation wave Partial discharge signal feature extracting method | |
CN115343579B (en) | Power grid fault analysis method and device and electronic equipment | |
CN113672658B (en) | Power equipment online monitoring error data identification method based on complex correlation coefficient | |
CN110991821A (en) | Substation live operation and inspection auxiliary analysis method | |
CN114280434A (en) | Quantitative analysis method and system for degradation degree of composite insulator | |
CN116184060A (en) | Abnormal monitoring method and system suitable for porcelain insulator live working | |
CN114137401A (en) | Method and device for determining electromagnetic signal of fault | |
CN113884970A (en) | On-site online calibration method for harmonic parameters of power quality monitoring device | |
CN113868831A (en) | Battery capacity consistency estimation method and system | |
CN116821834B (en) | Vacuum circuit breaker overhauling management system based on embedded sensor | |
CN118091489B (en) | Method for detecting state of glass insulator of power transmission line | |
CN113884969B (en) | Error threshold determining method for detecting power quality monitoring device by using fractal dimension | |
CN116976754B (en) | High-precision capacitance measurement method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |