CN110543903A - Data cleaning method and system for GIS partial discharge big data system - Google Patents

Data cleaning method and system for GIS partial discharge big data system Download PDF

Info

Publication number
CN110543903A
CN110543903A CN201910783712.8A CN201910783712A CN110543903A CN 110543903 A CN110543903 A CN 110543903A CN 201910783712 A CN201910783712 A CN 201910783712A CN 110543903 A CN110543903 A CN 110543903A
Authority
CN
China
Prior art keywords
data
cleaning
partial discharge
gis
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910783712.8A
Other languages
Chinese (zh)
Other versions
CN110543903B (en
Inventor
杨景刚
贾骏
胡成博
刘洋
徐阳
张照辉
路永玲
黄成军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
State Grid Corp of China SGCC
Southeast University
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Shanghai Jiaotong University
State Grid Corp of China SGCC
Southeast University
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, State Grid Corp of China SGCC, Southeast University, Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd filed Critical Shanghai Jiaotong University
Priority to CN201910783712.8A priority Critical patent/CN110543903B/en
Publication of CN110543903A publication Critical patent/CN110543903A/en
Application granted granted Critical
Publication of CN110543903B publication Critical patent/CN110543903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/12Testing dielectric strength or breakdown voltage ; Testing or monitoring effectiveness or level of insulation, e.g. of a cable or of an apparatus, for example using partial discharge measurements; Electrostatic testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention provides a data cleaning method and a system of a GIS (geographic information System) local discharge big data system, which comprises the steps of firstly, establishing the GIS local discharge big data system, collecting data, formulating a uniform data format, developing a conversion module and carrying out format normalization conversion on a heterogeneous data source; then, processing abnormal data through exploratory analysis, establishing corresponding cleaning rules for different data types, forming a data cleaning rule base, arranging the data into a data set with uniform and accurate format, and continuously optimizing the data cleaning rule base; and finally, mining and analyzing the data. The invention can convert the format of the partial discharge detection data and clean the data, and utilizes a high-level algorithm to analyze and mine, thereby improving the diagnosis accuracy of the partial discharge detection data.

Description

data cleaning method and system for GIS partial discharge big data system
Technical Field
the invention relates to the technical field of Gas Insulated Switchgear (GIS) partial discharge big data processing, in particular to a data cleaning method and system of a GIS partial discharge big data system.
Background
With the rapid development of the scale of the power grid and the continuous increase of the power load, the requirements on the reliability and the safety of the operation of the electrical equipment are continuously improved, and the condition detection of the insulation performance of the electrical equipment becomes more significant. The method is used for effectively and accurately detecting and evaluating the electrical equipment, is a premise for state maintenance and full-life cycle management, and is a basis for ensuring safe and reliable operation of the electrical equipment.
after the power equipment is put into operation, the phenomena of corona and surface partial discharge of the equipment can be caused due to design defects, surface dirt, poor contact and the like, physical signals such as magnetism, electricity, light, sound, heat and the like and chemical signals such as gas concentration change can be generated at fault points when discharge occurs, and various phenomena generated by the discharge are the basis of partial discharge detection. Power companies have developed a large number of partial discharge detection works for Gas Insulated Switchgear (GIS), including forms such as live-line detection, intensive care, on-line monitoring, off-line testing, and the adopted partial discharge detection instruments are various and have different data formats. The partial discharge detection data is stored partially in the power production management system and partially in a discrete form. And partial discharge data analysis can be better response GIS equipment's insulating properties, and partial discharge detection has important meaning to GIS equipment steady operation. In recent years, big data, cloud computing and artificial intelligence technologies are rapidly developed, large-scale application is carried out in various industries, and key decision support is provided for enterprises through means of distributed computing, data mining and the like. And the data quality, the authenticity and the usability are the key points for guaranteeing the analysis result.
when large data is applied in the power industry, data is mainly cleaned in a 'deleting' mode at present, and data is classified manually, so that small effect is achieved. However, the following problems exist to restrict the further development of data application:
1. for the problems of missing values, abnormal values and repeated values of data, a plurality of data which can be utilized through data processing are lost by adopting simple deletion operation;
2. The data formats are not unified, the data formats generated by various detection instruments are not unified, a set of data mining method needs to be developed for each data, the efficiency is low, and the comprehensive application of the data is lacked;
3. The data are classified manually, errors are easy to occur, the efficiency is low, and the data are not classified by combining a classification algorithm.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a data cleaning method and a data cleaning system for a GIS (geographic information System) partial discharge big data system, which create conditions for data mining and realize intelligent diagnosis of partial discharge detection data of GIS equipment.
The technical scheme is as follows: the invention relates to a data cleaning method of a GIS partial discharge big data system, which comprises the following steps:
(1) A system established based on cosine acquires data and carries out normalization conversion on the acquired data;
(2) processing the abnormal data normalized and converted in the step (1) through exploratory analysis, establishing corresponding cleaning rules for different data types, forming a data cleaning rule base, arranging the data into a data set with a uniform format, and continuously optimizing the data cleaning rule base; after the data are cleaned, classifying and learning the data by using an SVM (support vector machine) and a GooLeNet deep learning algorithm;
(3) Classifying and labeling the data through an SVM (support vector machine), and exporting sample and label data to form a training set and a verification set; training and verifying the GoogLeNet deep learning algorithm, optimizing parameters of each layer and continuously training, and finishing the training when a set threshold value is reached.
The abnormal data in the step (2) comprises missing values, abnormal values and repeated values.
The step (2) is realized by the following steps:
The method comprises the following steps of carrying out preliminary exploration on data by utilizing a Python scientific calculation library, wherein the preliminary exploration on data comprises the analysis on data types, missing values, data set scales and data distribution conditions under various characteristics, carrying out visual observation by utilizing a drawing method to obtain basic attributes and distribution conditions of the data, preliminarily exploring the relationship among various characteristics in the data set by virtue of univariate analysis and multivariate analysis, determining the importance degree of various attributes of the data according to the characteristic relationship, calculating the missing rate according to the importance degree of the attributes, and formulating a cleaning rule;
dividing the importance degree of the data attribute into 'important' and 'general', acquiring the missing value of the important attribute by a plurality of methods of Pandas, calculating the missing rate, and deleting the attribute when the missing rate is more than or equal to 60%; when the deletion rate is more than 60% and is more than or equal to 30%, simply filling according to the data distribution condition; when the deletion rate is less than 30%, supplementing by adopting an interpolation method and a modeling method;
Using the principle of abnormal value, searching for a value with a deviation of more than 3 times of standard deviation from the average value in the measured values, and deleting data away from the average value when the data obeys normal distribution, as follows:
the duplicate removal processing sorts the data set records according to a certain rule, calculates the similarity, judges by using a duplicate method, and deletes the repeated data;
after a data cleaning rule is formulated, the cleaning rule is informationized, stored in a cleaning library and cleaned by using an informatization means; and establishing a data cleaning rule base by combining the service requirements of GIS partial discharge detection data, filtering the data on the service level by using a data cleaning module according to the rules, and continuously optimizing the data cleaning rule base according to the filtering result.
the step (3) comprises the following steps:
(31) classifying and labeling the data through an SVM (support vector machine), converting the data from an input space to a feature space through a kernel function phi x → H, and classifying the data with an optimal hyperplane w phi T phi (x) + b being 0 as follows:
W is a weight vector, a and b are vectors to be solved, y is a real label, xi and x are input samples, and K (xi, x) is a kernel function for replacing the inner product operation of xi and x; based on the classification method, defining the basic type of the partial discharge fault, forming a tag library, and labeling the data one by one according to the classification result of the SVM (support vector machine);
(32) learning the labeled sample data in the step (31) by using a deep learning artificial intelligence algorithm: the GIS partial discharge data is led out of a training library and a verification library in the form of a map and a label, a GoogLeNet deep learning algorithm is used for training, the verification library is used for verifying the accuracy of a diagnosis model, if the accuracy is lower than a set threshold, parameters of each layer are adjusted to carry out multiple rounds of optimization training, and the diagnosis model is integrated in a GIS partial discharge big data system and diagnoses a partial discharge detection map; mining the value of the data by using a big data mining technology, and exploring the incidence relation between the detected data and the GIS equipment defects;
(33) mining the data value through a big data mining technology: big data mining is carried out by using Orange software based on component data mining and machine learning, and the incidence relation between GIS partial discharge multi-source data and GIS faults is explored.
The invention also discloses a data cleaning system of the GIS local discharge big data system, which comprises the GIS local discharge big data system, a data cleaning module and an excavation analysis module; the GIS local discharge big data system acquires data based on a cosine-built system and performs normalization conversion on the acquired data; the data cleaning module processes abnormal data after normalization and conversion through exploratory analysis, establishes corresponding cleaning rules for different data types, forms a data cleaning rule base, arranges the data into a data set with a uniform format, and continuously optimizes the data cleaning rule base; after the data are cleaned, classifying and learning the data by using an SVM (support vector machine) and a GooLeNet deep learning algorithm; the mining analysis module classifies and labels data through an SVM (support vector machine), derives sample and label data, and forms a training set and a verification set; training and verifying the GoogLeNet deep learning algorithm, optimizing parameters of each layer and continuously training, and finishing the training when a set threshold value is reached.
Further, the abnormal data includes missing values, abnormal values, and duplicate values.
The data cleaning module utilizes a Python scientific calculation library to perform preliminary data exploration, including the analysis of data types, missing values, data set scales and data distribution conditions under various characteristics, and utilizes a drawing method to perform visual observation to obtain basic attributes and distribution conditions of data, preliminarily explores the relationship among various characteristics in the data set through univariate analysis and multivariate analysis, determines the importance degree of various attributes of the data according to the characteristic relationship, calculates the missing rate according to the importance degree of the attributes, and formulates cleaning rules; dividing the importance degree of the data attribute into 'important' and 'general', acquiring the missing value of the important attribute by a plurality of methods of Pandas, calculating the missing rate, and deleting the attribute when the missing rate is more than or equal to 60%; when the deletion rate is more than 60% and is more than or equal to 30%, simply filling according to the data distribution condition; when the deletion rate is less than 30%, supplementing by adopting an interpolation method and a modeling method; using the principle of abnormal value, searching for a value with a deviation of more than 3 times of standard deviation from the average value in the measured values, and deleting data away from the average value when the data obeys normal distribution, as follows:
the duplicate removal processing sorts the data set records according to a certain rule, calculates the similarity, judges by using a duplicate method, and deletes the repeated data; after a data cleaning rule is formulated, the cleaning rule is informationized, stored in a cleaning library and cleaned by using an informatization means; and establishing a data cleaning rule base by combining the service requirements of GIS partial discharge detection data, filtering the data on the service level by using a data cleaning module according to the rules, and continuously optimizing the data cleaning rule base according to the filtering result.
The mining analysis module classifies and labels data through an SVM (support vector machine), the data are converted into a feature space from an input space through a kernel function phi x → H, and the data are classified by an optimal hyperplane w phi T phi (x) + b being 0; learning the labeled sample data by using a deep learning artificial intelligence algorithm; big data mining is carried out by using Orange software based on component data mining and machine learning, and the incidence relation between GIS partial discharge multi-source data and GIS faults is explored.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. the method takes the situations that GIS partial discharge data are different in quality, format and error data and difficult to further mine and utilize into consideration, develops a GIS partial discharge big data system, and normalizes the partial discharge data based on formulating a uniform data format and developing a data format conversion module; 2. the data are cleaned by methods such as exploratory analysis, missing value/abnormal value/duplicate removal processing, a data cleaning rule base and the like, so that the data quality is improved, and the data have the basis of big data analysis application; 3. the method comprises the steps of classifying and labeling data by using an SVM (support vector machine), learning and training the data by using a deep learning artificial intelligence algorithm, improving the intelligent diagnosis level of GIS partial discharge detection data, analyzing the incidence relation between the partial discharge data and GIS equipment by using a big data mining technology, analyzing the state of the GIS equipment, realizing prediction and early warning, and improving the utilization of GIS partial discharge big data and intelligent analysis of the equipment state.
drawings
FIG. 1 is a system flow diagram of partial discharge big data;
FIG. 2 is a flow chart of cleaning data using a data cleaning rule base;
FIG. 3 is a fault classification diagram of a partial discharge spectrum by an SVM (support vector machine);
FIG. 4 is a deep learning diagnostic algorithm diagnostic partial discharge data.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
A data cleaning method for a GIS partial discharge big data system is shown in a schematic flow chart of fig. 1 and can be divided into the GIS partial discharge big data system, data cleaning and mining analysis.
The first step is as follows: developing a GIS local discharge big data system, collecting GIS equipment live detection, intensive care, on-line monitoring, off-line tests and other local discharge data including discrete data, structured data in a power production management system and other systems, storing by adopting a MangoDB database, and generating a database DataSoonce; and formulating a unified data format according to the partial discharge detection service and various data characteristics, developing a data format conversion processing function module, carrying out normalization processing on various heterogeneous data, and storing the data analysis in a database to be analyzed.
the second step is that: data cleaning, specifically including:
1) exploratory analysis. The method comprises the steps of utilizing a Python scientific calculation library to conduct preliminary exploration on data, wherein the data include but are not limited to data types, missing values, data set scales, data distribution conditions under various characteristics and the like, utilizing a drawing method to conduct visual observation to obtain basic attributes and distribution conditions of the data, and simultaneously conducting univariate analysis and multivariate analysis to preliminarily explore relations among various characteristics in the data set to verify assumptions provided in a business analysis stage.
2) Missing value/outlier/deduplication processing. Obtaining a missing value by various methods of Pandas, and simply filling according to the data distribution condition when the missing rate is low and the attribute importance degree is low; when the deletion rate is greater than 95% and the importance degree of the attribute is low, deleting the attribute; and when the deficiency rate is high and the attribute importance degree is high, supplementing by adopting an interpolation method and a modeling method.
Using the principle of abnormal value, searching for a value with a deviation of more than 3 times of standard deviation from the average value in the measured values, and deleting data away from the average value when the data obeys normal distribution, as follows:
The deduplication processing sorts the data set records according to a certain rule, then calculates the similarity, judges by using a Duplicated method, and deletes the repeated data.
3) and (5) cleaning a rule base by data. The data cleaning process by using the data cleaning rule base is shown in fig. 2, and in combination with the service requirement of the GIS partial discharge detection data, each data type establishes a corresponding cleaning rule in the cleaning rule base, and the data cleaning module performs service level filtering on the data according to the data type matching with the corresponding rule. And meanwhile, continuously optimizing the data cleaning rule base according to the filtering result.
The third step: the mining analysis specifically comprises the following steps:
1) And classifying and labeling SVM data. The data is classified and labeled by an SVM (support vector machine), and the classification process is shown in FIG. 3. The data is transformed from the input space to the feature space by a kernel function φ x → H, classifying the data with an optimal hyperplane w φ T φ (x) + b as 0, as follows:
Defining the large basic types of the partial discharge faults 8 to form a tag library, and marking the data one by one according to the classification result of the SVM (support vector machine) as shown in FIG. 4.
2) And (4) deep learning algorithm. And learning the data by utilizing a deep learning artificial intelligence algorithm. And exporting 75% of training library and 25% of verification library from GIS partial discharge data in the form of 'map + label', training by using GoogLeNet deep learning algorithm, and verifying the accuracy of the diagnosis model by using the 25% of verification library, wherein if the accuracy is lower than 95%, parameters of each layer are adjusted to perform multiple rounds of optimization training. The diagnosis model is integrated in a GIS local discharge big data system and diagnoses a local discharge detection map, and the diagnosis process is shown in fig. 4.
3) and (5) mining big data. And mining the data value through a big data mining technology. By utilizing key technologies such as a statistical technology and an association rule, big data mining is carried out by using Orange software based on component data mining and machine learning, the association relation between GIS partial discharge multi-source data and GIS faults is explored, and prediction and early warning of the GIS equipment state are realized according to the characteristics of partial discharge detection data.
The method comprises the steps of collecting discrete data of GIS live detection and structured data of a power production management system by establishing a GIS partial discharge big data system to form a multi-source heterogeneous data source and generate a database DataSource; analyzing data characteristics to formulate a uniform data format specification, developing a data format conversion module to convert heterogeneous data into a uniform format and store the uniform format into a database to be analyzed, cleaning the data by methods of exploratory analysis, missing value/abnormal value/duplication removal treatment, cleaning the data by using a data cleaning rule base and the like, labeling the cleaned data, deriving the data into a sample, training a deep learning algorithm by using 75% of sample data, and verifying the accuracy of the algorithm by using 25% of the sample data. Format normalization and data cleaning of multi-source heterogeneous data are achieved, data value is mined through a high-level algorithm, and diagnosis of GIS partial discharge state is guided.
the invention also provides a data cleaning system of the GIS local discharge big data system, which comprises the GIS local discharge big data system, a data cleaning module and an excavation analysis module; the GIS local discharge big data system acquires data based on a cosine-built system and performs normalization conversion on the acquired data; the data cleaning module processes abnormal data after normalization and conversion through exploratory analysis, establishes corresponding cleaning rules for different data types, forms a data cleaning rule base, arranges the data into a data set with a uniform format, and continuously optimizes the data cleaning rule base; after the data are cleaned, classifying and learning the data by using an SVM (support vector machine) and a GooLeNet deep learning algorithm; the mining analysis module classifies and labels data through an SVM (support vector machine), derives sample and label data, and forms a training set and a verification set; training and verifying the GoogLeNet deep learning algorithm, optimizing parameters of each layer and continuously training, and finishing the training when a set threshold value is reached.
In conclusion, by establishing a GIS local discharge big data system, collecting and arranging various GIS local discharge detection data, cleaning and labeling the data, deriving the local discharge big data as a sample, training the sample by using a deep learning algorithm, mining the association relation between the local discharge data and the GIS equipment fault by using a big data mining algorithm, effectively diagnosing a GIS local discharge map, and predicting and early warning the GIS equipment state according to map characteristics. Therefore, the GIS partial discharge big data cleaning method effectively cleans GIS partial discharge big data, performs big data mining application, and proves the effectiveness of the data cleaning method of the GIS partial discharge big data system.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
the present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
these computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (8)

1. a data cleaning method of a GIS partial discharge big data system is characterized by comprising the following steps:
(1) a system established based on cosine acquires data and carries out normalization conversion on the acquired data;
(2) Processing the abnormal data normalized and converted in the step (1) through exploratory analysis, establishing corresponding cleaning rules for different data types, forming a data cleaning rule base, arranging the data into a data set with a uniform format, and continuously optimizing the data cleaning rule base; after the data are cleaned, classifying and learning the data by using an SVM (support vector machine) and a GooLeNet deep learning algorithm;
(3) Classifying and labeling the data through an SVM (support vector machine), and exporting sample and label data to form a training set and a verification set; training and verifying the GoogLeNet deep learning algorithm, optimizing parameters of each layer and continuously training, and finishing the training when a set threshold value is reached.
2. the data cleaning method for the GIS partial discharge big data system according to claim 1, wherein the abnormal data in step (2) includes missing values, abnormal values and repeated values.
3. The data cleaning method for the GIS partial discharge big data system according to claim 1, wherein the step (2) is implemented as follows:
The method comprises the following steps of carrying out preliminary exploration on data by utilizing a Python scientific calculation library, wherein the preliminary exploration on data comprises the analysis on data types, missing values, data set scales and data distribution conditions under various characteristics, carrying out visual observation by utilizing a drawing method to obtain basic attributes and distribution conditions of the data, preliminarily exploring the relationship among various characteristics in the data set by virtue of univariate analysis and multivariate analysis, determining the importance degree of various attributes of the data according to the characteristic relationship, calculating the missing rate according to the importance degree of the attributes, and formulating a cleaning rule;
dividing the importance degree of the data attribute into 'important' and 'general', acquiring the missing value of the important attribute by a plurality of methods of Pandas, calculating the missing rate, and deleting the attribute when the missing rate is more than or equal to 60%; when the deletion rate is more than 60% and is more than or equal to 30%, simply filling according to the data distribution condition; when the deletion rate is less than 30%, supplementing by adopting an interpolation method and a modeling method;
Using the principle of abnormal value, searching for a value with a deviation of more than 3 times of standard deviation from the average value in the measured values, and deleting data away from the average value when the data obeys normal distribution, as follows:
the duplicate removal processing sorts the data set records according to a certain rule, calculates the similarity, judges by using a duplicate method, and deletes the repeated data;
after a data cleaning rule is formulated, the cleaning rule is informationized, stored in a cleaning library and cleaned by using an informatization means; and establishing a data cleaning rule base by combining the service requirements of GIS partial discharge detection data, filtering the data on the service level by using a data cleaning module according to the rules, and continuously optimizing the data cleaning rule base according to the filtering result.
4. The data cleaning method for the GIS partial discharge big data system according to claim 1, wherein the step (3) comprises the following steps:
(31) classifying and labeling the data through an SVM (support vector machine), converting the data from an input space to a feature space through a kernel function phi x → H, and classifying the data with an optimal hyperplane w phi T phi (x) + b being 0 as follows:
w is a weight vector, a and b are vectors to be solved, y is a real label, xi and x are input samples, and K (xi, x) is a kernel function for replacing the inner product operation of xi and x; based on the classification method, defining the basic type of the partial discharge fault, forming a tag library, and labeling the data one by one according to the classification result of the SVM (support vector machine);
(32) Learning the labeled sample data in the step (31) by using a deep learning artificial intelligence algorithm: the GIS partial discharge data is led out of a training library and a verification library in the form of a map and a label, a GoogLeNet deep learning algorithm is used for training, the verification library is used for verifying the accuracy of a diagnosis model, if the accuracy is lower than a set threshold, parameters of each layer are adjusted to carry out multiple rounds of optimization training, and the diagnosis model is integrated in a GIS partial discharge big data system and diagnoses a partial discharge detection map; mining the value of the data by using a big data mining technology, and exploring the incidence relation between the detected data and the GIS equipment defects;
(33) Mining the data value through a big data mining technology: big data mining is carried out by using Orange software based on component data mining and machine learning, and the incidence relation between GIS partial discharge multi-source data and GIS faults is explored.
5. a data cleaning system of a GIS partial discharge big data system is characterized by comprising the GIS partial discharge big data system, a data cleaning module and an excavation analysis module; the GIS local discharge big data system acquires data based on a cosine-built system and performs normalization conversion on the acquired data; the data cleaning module processes abnormal data after normalization and conversion through exploratory analysis, establishes corresponding cleaning rules for different data types, forms a data cleaning rule base, arranges the data into a data set with a uniform format, and continuously optimizes the data cleaning rule base; after the data are cleaned, classifying and learning the data by using an SVM (support vector machine) and a GooLeNet deep learning algorithm; the mining analysis module classifies and labels data through an SVM (support vector machine), derives sample and label data, and forms a training set and a verification set; training and verifying the GoogLeNet deep learning algorithm, optimizing parameters of each layer and continuously training, and finishing the training when a set threshold value is reached.
6. the data cleaning system of the GIS partial discharge big data system according to claim 5, wherein the abnormal data comprises missing values, abnormal values and repeated values.
7. The data cleaning system of the GIS partial discharge big data system according to claim 5, wherein the data cleaning module performs preliminary data exploration by using a Python scientific computation library, including analysis of data types, missing values, data set scales and data distribution conditions under each feature, and performs visual observation by using a drawing method to obtain basic attributes and distribution conditions of data, preliminarily explores the relationship among each feature in the data set through univariate analysis and multivariate analysis, determines the importance degree of each attribute of the data according to the feature relationship, calculates the missing rate according to the importance degree of the attribute, and formulates a cleaning rule; dividing the importance degree of the data attribute into 'important' and 'general', acquiring the missing value of the important attribute by a plurality of methods of Pandas, calculating the missing rate, and deleting the attribute when the missing rate is more than or equal to 60%; when the deletion rate is more than 60% and is more than or equal to 30%, simply filling according to the data distribution condition; when the deletion rate is less than 30%, supplementing by adopting an interpolation method and a modeling method; using the principle of abnormal value, searching for a value with a deviation of more than 3 times of standard deviation from the average value in the measured values, and deleting data away from the average value when the data obeys normal distribution, as follows:
The duplicate removal processing sorts the data set records according to a certain rule, calculates the similarity, judges by using a duplicate method, and deletes the repeated data; after a data cleaning rule is formulated, the cleaning rule is informationized, stored in a cleaning library and cleaned by using an informatization means; and establishing a data cleaning rule base by combining the service requirements of GIS partial discharge detection data, filtering the data on the service level by using a data cleaning module according to the rules, and continuously optimizing the data cleaning rule base according to the filtering result.
8. The data cleaning system of the GIS partial discharge big data system according to claim 5, characterized in that the mining analysis module classifies and labels the data by SVM support vector machine, transforms the data from the input space to the feature space by kernel function φ: x → H, and classifies the data with optimal hyperplane w φ T φ (x) + b ═ 0; learning the labeled sample data by using a deep learning artificial intelligence algorithm; big data mining is carried out by using Orange software based on component data mining and machine learning, and the incidence relation between GIS partial discharge multi-source data and GIS faults is explored.
CN201910783712.8A 2019-08-23 2019-08-23 Data cleaning method and system for GIS partial discharge big data system Active CN110543903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910783712.8A CN110543903B (en) 2019-08-23 2019-08-23 Data cleaning method and system for GIS partial discharge big data system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910783712.8A CN110543903B (en) 2019-08-23 2019-08-23 Data cleaning method and system for GIS partial discharge big data system

Publications (2)

Publication Number Publication Date
CN110543903A true CN110543903A (en) 2019-12-06
CN110543903B CN110543903B (en) 2022-02-15

Family

ID=68711938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910783712.8A Active CN110543903B (en) 2019-08-23 2019-08-23 Data cleaning method and system for GIS partial discharge big data system

Country Status (1)

Country Link
CN (1) CN110543903B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444400A (en) * 2020-04-07 2020-07-24 中国汽车工程研究院股份有限公司 Force and flow field data management method
CN111581194A (en) * 2020-04-29 2020-08-25 上海市特种设备监督检验技术研究院 Pretreatment and cleaning method based on elevator big data
CN112926627A (en) * 2021-01-28 2021-06-08 电子科技大学 Equipment defect time prediction method based on capacitive equipment defect data
CN113033694A (en) * 2021-04-09 2021-06-25 深圳亿嘉和科技研发有限公司 Data cleaning method based on deep learning
CN113177040A (en) * 2021-04-29 2021-07-27 东北大学 Full-process big data cleaning and analyzing method for aluminum/copper plate strip production
CN113377753A (en) * 2021-06-09 2021-09-10 国网吉林省电力有限公司 Heat accumulating type electric boiler load data cleaning system
CN114896228A (en) * 2022-04-27 2022-08-12 西北工业大学 Industrial data stream cleaning model and method based on multi-stage combination optimization of filtering rules
CN115794795A (en) * 2022-12-08 2023-03-14 湖北华中电力科技开发有限责任公司 Power distribution station power consumption data standardized cleaning method, device and system and storage medium
CN115809406A (en) * 2023-02-03 2023-03-17 佰聆数据股份有限公司 Power consumer fine-grained classification method, device, equipment and storage medium
CN116166655A (en) * 2023-04-25 2023-05-26 尚特杰电力科技有限公司 Big data cleaning system
CN117041168A (en) * 2023-10-09 2023-11-10 常州楠菲微电子有限公司 QoS queue scheduling realization method and device, storage medium and processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100004898A1 (en) * 2008-07-03 2010-01-07 Caterpillar Inc. Method and system for pre-processing data using the Mahalanobis Distance (MD)
CN105044567A (en) * 2015-06-29 2015-11-11 许继集团有限公司 GIS partial discharge on-line monitoring mode identification method and GIS partial discharge on-line monitoring mode identification system
CN108564254A (en) * 2018-03-15 2018-09-21 国网四川省电力公司绵阳供电公司 Controller switching equipment status visualization platform based on big data
CN109359697A (en) * 2018-10-30 2019-02-19 国网四川省电力公司广元供电公司 Graph image recognition methods and inspection system used in a kind of power equipment inspection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100004898A1 (en) * 2008-07-03 2010-01-07 Caterpillar Inc. Method and system for pre-processing data using the Mahalanobis Distance (MD)
CN105044567A (en) * 2015-06-29 2015-11-11 许继集团有限公司 GIS partial discharge on-line monitoring mode identification method and GIS partial discharge on-line monitoring mode identification system
CN108564254A (en) * 2018-03-15 2018-09-21 国网四川省电力公司绵阳供电公司 Controller switching equipment status visualization platform based on big data
CN109359697A (en) * 2018-10-30 2019-02-19 国网四川省电力公司广元供电公司 Graph image recognition methods and inspection system used in a kind of power equipment inspection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAO LI等: "The Characteristics and Detection of the Partial Discharge Induced by the Micro Cracks in GIS Insulators", 《2018 CONDITION MONITORING AND DIAGNOSIS (CMD)》 *
魏丽峰等: "基于大数据分析挖掘技术的电力设备局部放电诊断方法", 《科学技术与工程》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444400A (en) * 2020-04-07 2020-07-24 中国汽车工程研究院股份有限公司 Force and flow field data management method
CN111581194A (en) * 2020-04-29 2020-08-25 上海市特种设备监督检验技术研究院 Pretreatment and cleaning method based on elevator big data
CN112926627A (en) * 2021-01-28 2021-06-08 电子科技大学 Equipment defect time prediction method based on capacitive equipment defect data
CN113033694B (en) * 2021-04-09 2023-04-07 深圳亿嘉和科技研发有限公司 Data cleaning method based on deep learning
CN113033694A (en) * 2021-04-09 2021-06-25 深圳亿嘉和科技研发有限公司 Data cleaning method based on deep learning
CN113177040A (en) * 2021-04-29 2021-07-27 东北大学 Full-process big data cleaning and analyzing method for aluminum/copper plate strip production
CN113377753A (en) * 2021-06-09 2021-09-10 国网吉林省电力有限公司 Heat accumulating type electric boiler load data cleaning system
CN114896228A (en) * 2022-04-27 2022-08-12 西北工业大学 Industrial data stream cleaning model and method based on multi-stage combination optimization of filtering rules
CN114896228B (en) * 2022-04-27 2024-04-05 西北工业大学 Industrial data stream cleaning model and method based on filtering rule multistage combination optimization
CN115794795A (en) * 2022-12-08 2023-03-14 湖北华中电力科技开发有限责任公司 Power distribution station power consumption data standardized cleaning method, device and system and storage medium
CN115794795B (en) * 2022-12-08 2023-09-22 湖北华中电力科技开发有限责任公司 Power distribution station electricity consumption data standardization cleaning method, device, system and storage medium
CN115809406A (en) * 2023-02-03 2023-03-17 佰聆数据股份有限公司 Power consumer fine-grained classification method, device, equipment and storage medium
CN115809406B (en) * 2023-02-03 2023-05-12 佰聆数据股份有限公司 Fine granularity classification method, device, equipment and storage medium for electric power users
CN116166655A (en) * 2023-04-25 2023-05-26 尚特杰电力科技有限公司 Big data cleaning system
CN117041168A (en) * 2023-10-09 2023-11-10 常州楠菲微电子有限公司 QoS queue scheduling realization method and device, storage medium and processor

Also Published As

Publication number Publication date
CN110543903B (en) 2022-02-15

Similar Documents

Publication Publication Date Title
CN110543903B (en) Data cleaning method and system for GIS partial discharge big data system
US9753066B2 (en) Methods and apparatus of analyzing electrical power grid data
CN110825644B (en) Cross-project software defect prediction method and system
Zhang et al. Time series anomaly detection for smart grids: A survey
CN111259947A (en) Power system fault early warning method and system based on multi-mode learning
CN105044499A (en) Method for detecting transformer state of electric power system equipment
CN112132210A (en) Electricity stealing probability early warning analysis method based on customer electricity consumption behavior
Ma et al. Adequate and precise evaluation of quality models in software engineering studies
CN103957116A (en) Decision-making method and system of cloud failure data
CN112101471A (en) Electricity stealing probability early warning analysis method
CN117273489A (en) Photovoltaic state evaluation method and device
CN116257663A (en) Abnormality detection and association analysis method and related equipment for unmanned ground vehicle
Bond et al. A hybrid learning approach to prognostics and health management applied to military ground vehicles using time-series and maintenance event data
Jia et al. Robust and Transferable Log-based Anomaly Detection
Liu et al. Towards accurate subgraph similarity computation via neural graph pruning
CN110956281A (en) Power equipment abnormity detection alarm system based on Log analysis
CN114312930B (en) Train operation abnormality diagnosis method and device based on log data
Ye et al. An open data cleaning framework based on semantic rules for Continuous Auditing
CN113887932A (en) Operation and maintenance management and control method and device based on artificial intelligence and computer equipment
CN112308338A (en) Power data processing method and device
Dagnely et al. Annotating the performance of industrial assets via relevancy estimation of event logs
CN117539920B (en) Data query method and system based on real estate transaction multidimensional data
Zuo et al. An Improved AdaBoost Tree-Based Method for Defective Products Identification in Wafer Test
Li et al. TADL: Fault Localization with Transformer-based Anomaly Detection for Dynamic Microservice Systems
Schwenke et al. Identifying Informative Nodes in Attributed Spatial Sensor Networks using Attention for Symbolic Abstraction in a GNN-based Modeling Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant