CN106021452A - Electromagnetic environment measurement data cleaning method - Google Patents

Electromagnetic environment measurement data cleaning method Download PDF

Info

Publication number
CN106021452A
CN106021452A CN201610325629.2A CN201610325629A CN106021452A CN 106021452 A CN106021452 A CN 106021452A CN 201610325629 A CN201610325629 A CN 201610325629A CN 106021452 A CN106021452 A CN 106021452A
Authority
CN
China
Prior art keywords
data
electromagnetic environment
subset
parameter
designated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610325629.2A
Other languages
Chinese (zh)
Inventor
余占清
刘磊
付殷
李敏
曾嵘
田丰
罗兵
高超
杨芸
张波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
CSG Electric Power Research Institute
Research Institute of Southern Power Grid Co Ltd
Original Assignee
Tsinghua University
Research Institute of Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Research Institute of Southern Power Grid Co Ltd filed Critical Tsinghua University
Priority to CN201610325629.2A priority Critical patent/CN106021452A/en
Publication of CN106021452A publication Critical patent/CN106021452A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Testing Electric Properties And Detecting Electric Faults (AREA)

Abstract

The invention relates to an electromagnetic environment measurement data cleaning method, and belongs to the technical field of electromagnetic environment protection of electric power systems. The method aims at an original data set formed by electromagnetic environment parameters in electromagnetic environment measurement and meteorological parameter of environment in which electric transmission lines are located; and according to the method, data cleaning is carried out on the measurement data by utilizing a clustering method according to correlation between the electromagnetic environment and the meteorological parameters. By applying the method, recognition and corresponding processing can be carried out on error data generated by bad data, environment interference and equipment faults in the electromagnetic environment measurement, so that more effective electromagnetic environment measurement data is finally formed. The method can be used for carrying out data cleaning on the electromagnetic environment measurement data more conveniently, effectively and reliably, and providing better original data for the processes of electromagnetic environment assessment and optimization of follow-up transmission and transformation projects.

Description

A kind of electromagnetic environment measurement data cleaning method
Technical field
The invention belongs to Power System Electromagnetic Environment guard technology field, particularly in order to electromagnetic environment measurement data to be carried out The processing method of data cleansing.
Background technology
Electromagnetic environment is the major consideration of high voltage power transmission and transforming system design, and the root of the electromagnetic environment of transmission line of electricity is circuit Corona discharge.On the one hand corona discharge can cause electric energy loss, increases Transmission Cost;On the other hand wire periphery electricity is affected Magnetic environment, disturbs the orthobiosis of people further.Along with economic development and the enhancing of common people's environmental consciousness, electromagnetism Environmental problem is the most noticeable, and its electromagnetic environment problem of ultra-high-tension power transmission line has become the design of its system and the main system run About factor.The electromagnetic environment parameter of ultra-high-tension power transmission line mainly include audible noise, radio interference, ground formate field intensity, Ground ion flow density and corona loss.The measurement of transmission line of electricity electromagnetic environment is for the analysis of follow-up electromagnetic environmental impact factor And the operating energy loss of the defense controls of electromagnetic environment, line parameter circuit value suffers from important meaning.
Electromagnetic environment data parameters is numerous, simultaneously because electromagnetic environment parameter and the randomness of meteorologic parameter, for electromagnetism ring The data cleansing of border measurement data just has indispensable effect.The purpose of data cleansing is to find out what those environment produced Bad data that noise or equipment and other measurement problems bring also carries out respective handling.Data after cleaning can more preferably be used In follow-up data analysis.And present stage does not also have the clear and definite Data Cleaning Method for electromagnetic environment measurement data: for The electromagnetic environment data that pointwise test obtains, the most with good grounds device threshold, the factors such as custom value is interval of parameter carry out rejecting Method;Multiple spot is tested simultaneously to the electromagnetic environment data obtained, in addition to the above methods can also be according to electromagnetic environment data Cross direction profiles characteristic carries out data scrubbing.But generally speaking, present stage electromagnetic environment data base method for cleaning the most more dependence people Work judges, not data scrubbing principle the most accurately.
Cluster analysis is that a kind of data split effectively method during Data Management Analysis.Its main purpose is basis The similarity of data divides large data set in groups;The Sub Data Set so divided has data height in same data set Similar, that the data height of different pieces of information collection is different feature.The most existing substantial amounts of clustering algorithm proposes, and generally can divide Be four classes: division methods, hierarchical method, based on density method, based on grid method.The most classical division methods For k mean algorithm, k mean algorithm, with k for input parameter, is divided into k subset the set of n object so that phase High with the similarity in subset, and different subset similarity is low.
Summary of the invention
It is an object of the invention to the weak point for overcoming prior art, propose a kind of electromagnetic environment measurement data cleaning method; This method can convenient, effectively and reliably carry out the data cleansing of electromagnetic environment measurement data, improve raw data base has Effect property, provides the most rational data support for follow-up data analysis.
The present invention proposes a kind of electromagnetic environment measurement data cleaning method, a kind of electromagnetic environment ginseng in measuring for electromagnetic environment The raw data set of the meteorologic parameter composition of number and circuit local environment.The method specifically includes following steps:
1) any one electromagnetic environment parameter measurement obtained forms initial data with the meteorologic parameter in transmission route survey place Collection, note raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj, J=1,2,3 ... m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1, 2、3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, former Every data line x in beginning data setiIt is changed to new data point therewith, is designated as xii, ii=1,2,3 ... n;Wherein ajj、xiiAll For one-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Obtain K subset, be designated as D1,D2,…,DK, and the cluster centre of each subset;Wherein round (x) For x is taken the value that rounds up;
4) calculation procedure 3) each subset D of obtainingiInterior data point xiiWith this subset DiThe distance of cluster centre, obtain All data points and the range data collection Dis of cluster centre in this subseti
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3);
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
The feature of the present invention and beneficial effect:
The method, according to the dependency of electromagnetic environment with meteorologic parameter, utilizes clustering method that measurement data is carried out data cleansing. Use the method can realize the mistake that the bad data produced in electromagnetic environment measurement and environmental disturbances, equipment fault etc. are produced Data are identified and carry out respective handling by mistake, ultimately form significantly more efficient electromagnetic environment measurement data.Class of the present invention is convenient, Effectively and reliably carry out the data cleansing of electromagnetic environment measurement data, can be follow-up electromagnetic environment assessment, The processes such as optimization provide more preferable initial data.
Accompanying drawing explanation
Fig. 1 is the FB(flow block) of the inventive method.
Detailed description of the invention
A kind of electromagnetic environment measurement data cleaning method that the present invention proposes, below in conjunction with the accompanying drawings with specific embodiment the most specifically Bright as follows.
A kind of electromagnetic environment measurement data cleaning method that the present invention proposes, a kind of electromagnetism ring in measuring for electromagnetic environment The raw data set of the meteorologic parameter composition of border parameter and transmission line of electricity local environment, as it is shown in figure 1, specifically include following Step:
1) any one electromagnetic environment parameter of measurement being obtained (includes audible noise, radio interference isoparametric wherein Kind) and meteorologic parameter (including wind speed, temperature, atmospheric pressure) the composition raw data set in transmission route survey place, note Raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj, j=1, 2、3…m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1,2, 3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, former Every data line x in beginning data setiIt is changed to new data point therewith and is designated as xii, ii=1,2,3 ... n;Wherein ajj、xiiIt is One-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Data set is clustered, obtains K subset, be designated as D1,D2,…,DK, and each subset is poly- Class center;Wherein round (x) is for take, to x, the value that rounds up;The present embodiment uses k Mean Method cluster;
4) calculation procedure 3) each subset D of obtainingiData point x in data classiiWith this subset DiThe cluster centre of class Distance, distance metric generally uses Euclidean distance, all data points and the range data of cluster centre in obtaining this subset Collection Disi
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3;
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
Above step 3) to step 6) calculating process process in software in general data calculating and all can complete, the present embodiment Select Matlab as software for calculation.

Claims (1)

1. an electromagnetic environment measurement data cleaning method, it is characterised in that the method is for the electricity in electromagnetic environment measurement The raw data set of the meteorologic parameter composition of magnetic environment parameter and transmission line of electricity local environment, specifically includes following steps:
1) any one electromagnetic environment parameter measurement obtained forms initial data with the meteorologic parameter in transmission route survey place Collection, note raw data set line number is n, columns is m;Each column data of raw data set represents a kind of parameter, is designated as aj, J=1,2,3 ... m;Every data line represents the value of each parameter of synchronization, is a data point, is designated as xi, i=1, 2、3…n;Wherein aj、xiBeing one-dimensional vector, m, n, i, j are positive integer;
2) initial data is concentrated each column data ajIt is normalized: note ajMeansigma methods be μ, standard deviation is σ, returns Column data after one changeJ=1,2,3 ... m;Use ajjReplace each column data a that initial data is concentratedj, this Time initial data concentrate each row data xiIt is changed to new data point therewith, is designated as xii, ii=1,2,3 ... n;Wherein ajj、 xiiBeing one-dimensional vector, ii, jj are positive integer;
3) to step 2) in raw data set be normalized after the new data set that obtains cluster, make clusters number Obtain K subset, be designated as D1,D2,…,DK, and the cluster centre of each subset;Wherein round (x) For x is taken the value that rounds up;
4) calculation procedure 3) each subset D of obtainingiInterior data point xiiWith this subset DiThe distance of cluster centre, be somebody's turn to do All data points and the range data collection Dis of cluster centre in subseti
5) to step 4) each range data collection Dis of obtainingiCarry out normal approach, according to 3 σ criterions, by outside 3 σ intervals Data point is identified as bad data point, and uses each subset DiInterior cluster centre replaces these bad data points;
6) make clusters number K reduce 1, return to step 3);
7) step 3 is repeated) to step 6) 5 times or after not having new bad data point to occur, complete data cleansing.
CN201610325629.2A 2016-05-16 2016-05-16 Electromagnetic environment measurement data cleaning method Pending CN106021452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610325629.2A CN106021452A (en) 2016-05-16 2016-05-16 Electromagnetic environment measurement data cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610325629.2A CN106021452A (en) 2016-05-16 2016-05-16 Electromagnetic environment measurement data cleaning method

Publications (1)

Publication Number Publication Date
CN106021452A true CN106021452A (en) 2016-10-12

Family

ID=57098388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610325629.2A Pending CN106021452A (en) 2016-05-16 2016-05-16 Electromagnetic environment measurement data cleaning method

Country Status (1)

Country Link
CN (1) CN106021452A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684320A (en) * 2018-12-25 2019-04-26 清华大学 The method and apparatus of monitoring data on-line cleaning
CN109684308A (en) * 2017-10-18 2019-04-26 南方电网科学研究院有限责任公司 A kind of electromagnetic environment parameter consistency method for cleaning and device based on pattern search
CN110472801A (en) * 2019-08-26 2019-11-19 南方电网科学研究院有限责任公司 DC power transmission line electromagnetic environment appraisal procedure and system
CN110866074A (en) * 2019-07-02 2020-03-06 黑龙江省电工仪器仪表工程技术研究中心有限公司 Electric energy meter improved K-means classification method based on regional characteristics
CN112783883A (en) * 2021-01-22 2021-05-11 广东电网有限责任公司东莞供电局 Power data standardized cleaning method and device under multi-source data access

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831431A (en) * 2012-02-05 2012-12-19 四川大学 Detector training method based on hierarchical clustering
CN104462819A (en) * 2014-12-09 2015-03-25 国网四川省电力公司信息通信公司 Local outlier detection method based on density clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831431A (en) * 2012-02-05 2012-12-19 四川大学 Detector training method based on hierarchical clustering
CN104462819A (en) * 2014-12-09 2015-03-25 国网四川省电力公司信息通信公司 Local outlier detection method based on density clustering

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684308A (en) * 2017-10-18 2019-04-26 南方电网科学研究院有限责任公司 A kind of electromagnetic environment parameter consistency method for cleaning and device based on pattern search
CN109684308B (en) * 2017-10-18 2020-11-17 南方电网科学研究院有限责任公司 Electromagnetic environment parameter consistency cleaning method and device based on pattern search
CN109684320A (en) * 2018-12-25 2019-04-26 清华大学 The method and apparatus of monitoring data on-line cleaning
CN109684320B (en) * 2018-12-25 2020-09-15 清华大学 Method and equipment for online cleaning of monitoring data
CN110866074A (en) * 2019-07-02 2020-03-06 黑龙江省电工仪器仪表工程技术研究中心有限公司 Electric energy meter improved K-means classification method based on regional characteristics
CN110866074B (en) * 2019-07-02 2022-11-04 黑龙江省电工仪器仪表工程技术研究中心有限公司 Electric energy meter improved K-means classification method based on regional characteristics
CN110472801A (en) * 2019-08-26 2019-11-19 南方电网科学研究院有限责任公司 DC power transmission line electromagnetic environment appraisal procedure and system
CN112783883A (en) * 2021-01-22 2021-05-11 广东电网有限责任公司东莞供电局 Power data standardized cleaning method and device under multi-source data access

Similar Documents

Publication Publication Date Title
Cai et al. Classification of power quality disturbances using Wigner-Ville distribution and deep convolutional neural networks
CN106021452A (en) Electromagnetic environment measurement data cleaning method
CN103076547B (en) Method for identifying GIS (Gas Insulated Switchgear) local discharge fault type mode based on support vector machines
Mahela et al. Recognition of power quality disturbances using S-transform based ruled decision tree and fuzzy C-means clustering classifiers
Lazzaretti et al. Novelty detection and multi-class classification in power distribution voltage waveforms
Liu et al. High-precision identification of power quality disturbances under strong noise environment based on FastICA and random forest
Wang et al. Fractal complexity-based feature extraction algorithm of communication signals
CN107589341B (en) Single-phase grounding online fault positioning method based on distribution automation main station
CN104198138A (en) Early warning method and system for abnormal vibration of wind driven generator
CN105447502A (en) Transient power disturbance identification method based on S conversion and improved SVM algorithm
Zaro et al. Power quality detection and classification using S-transform and rule-based decision tree
CN109470985A (en) A kind of voltage sag source identification methods based on more resolution singular value decompositions
CN103337248B (en) A kind of airport noise event recognition based on time series kernel clustering
CN104966161A (en) Electric energy quality recording data calculating analysis method based on Gaussian mixture model
CN108335010A (en) A kind of wind power output time series modeling method and system
CN106651031A (en) Lightning stroke flashover early warning method and system based on historical information
CN109061774A (en) A kind of thunderstorm core relevance processing method
CN102982347A (en) Method for electric energy quality disturbance classification based on KL distance
Mahela et al. Recognition of power quality disturbances using discrete wavelet transform and fuzzy C-means clustering
Mora-Florez et al. K-means algorithm and mixture distributions for locating faults in power systems
CN108008187B (en) Power grid harmonic detection method based on variational modal decomposition
CN109214402A (en) A kind of people having the same aspiration and interest unit grouping method of combination WAVELET FUZZY entropy and GG fuzzy clustering
Alshahrani et al. Detection and classification of power quality disturbances based on Hilbert-Huang transform and feed forward neural networks
Klinginsmith et al. Unsupervised clustering on pmu data for event characterization on smart grid
Dos Santos et al. Preprocessing in fuzzy time series to improve the forecasting accuracy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161012