CN114648060A - Fault signal standardization processing and classification method based on machine learning - Google Patents

Fault signal standardization processing and classification method based on machine learning Download PDF

Info

Publication number
CN114648060A
CN114648060A CN202210206997.0A CN202210206997A CN114648060A CN 114648060 A CN114648060 A CN 114648060A CN 202210206997 A CN202210206997 A CN 202210206997A CN 114648060 A CN114648060 A CN 114648060A
Authority
CN
China
Prior art keywords
data
machine learning
node
fault
power grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210206997.0A
Other languages
Chinese (zh)
Inventor
肖辅盛
吴俊杰
罗宇
刘亮
戴雯菊
李一荻
张恂
黄宇
金宇
黄晓旭
夏盛海
沈云春
穆萍
杨攀
卢昊
蒋猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202210206997.0A priority Critical patent/CN114648060A/en
Publication of CN114648060A publication Critical patent/CN114648060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a fault signal standardized processing and classifying method based on machine learning, which comprises the steps of carrying out standardized processing on historical fault information to obtain power grid equipment parameters; identifying fault types according to the parameters of the power grid equipment, and constructing node branches; traversing the division mode of each node branch, dividing historical fault information into different child nodes, calculating the purities of the child nodes, and selecting the child node with the highest purity as an optimal division mode; continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree; pruning the decision tree by utilizing the evaluation function to obtain an optimal decision tree; the method is based on the decision tree and optimizes to realize accurate classification of fault categories, and is less in time consumption.

Description

Fault signal standardization processing and classification method based on machine learning
Technical Field
The invention relates to the technical field of signal classification, in particular to a fault signal standardization processing and classifying method based on machine learning.
Background
In recent years, an expert system is an intelligent technology which is applied to the field of power grid fault diagnosis at the earliest, is relatively mature on the whole, and has relatively strong reasoning capability and fault interpretation capability on deterministic information, however, in the case of power grids with increasingly complex scales and topological structures, the construction of a rule base is difficult to complete, and when the structure of the power grid changes, the rule base needs to be updated correspondingly, so that the maintenance difficulty is very high, as the element fault and the alarm information for representing the fault do not have a one-to-one mapping relation, the fault symptom aliasing phenomenon caused under the condition of uncertain information is very serious, and under the background, the fault is difficult to identify accurately, and needs to be further improved.
The traditional technical scheme is as follows: the Bayesian network has outstanding advantages in power grid alarm diagnosis, combines graph theory and probability theory, expresses knowledge by using a network flow graph, describes the influence among different knowledge components by using the probability theory, and realizes power grid fault diagnosis through conditional probability reasoning; however, in practical application, a large-scale power system has complex faults and non-uniform data, and relatively difficult prior probability acquisition, which affects diagnosis efficiency.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and title of the application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the invention provides a fault signal normalization processing and classification method based on machine learning, which can avoid the problem of inaccurate classification caused by excessive data and types.
In order to solve the technical problems, the invention provides the following technical scheme: the method comprises the following steps: carrying out standardized processing on historical fault information to obtain power grid equipment parameters; identifying fault types according to the power grid equipment parameters, and constructing node branches; traversing the division mode of each node branch, dividing historical fault information into different sub-nodes, calculating the purities of the sub-nodes, and selecting the sub-node with the highest purity as the optimal division mode; continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree; and pruning the decision tree by utilizing an evaluation function to obtain an optimal decision tree.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the historical fault information includes: historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the normalization processing comprises: filling vacancy values in the data through data cleaning, and removing noise data; and merging data from different data sources through data integration to form unified data storage, removing redundant attributes, searching and deleting repeated data to obtain the power grid equipment parameters.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: constructing the node branches includes: if the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch; if the power grid equipment parameters are discrete values and the decision tree is required to be generated, dividing a subset by fault categories, testing, and respectively constructing two branches according to the power grid equipment parameters belonging to the subset and not belonging to the subset; if the power grid equipment parameter is a continuous value, determining a value as a split point, and constructing two different branches according to the power grid equipment parameters greater than and less than the split point.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: purity h (x) includes:
Figure BDA0003531583950000021
wherein N represents N different discrete values of the variable X, PiRepresenting the probability of occurrence of event i.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: continuing to partition the new child nodes includes: selecting a new child node of the data set by the information gain, selecting the division of the node with the largest information gain as:
Gain(D,A)=H(D)-H(D|A)
wherein Gain (D, a) is information Gain, H (D) is information entropy of the data set D, and H (D | a) is conditional entropy of the characteristic attribute a.
As a preferred solution of the fault signal normalization processing and classifying method based on machine learning according to the present invention, wherein: the evaluation function c (x) includes:
Figure BDA0003531583950000022
wherein n is the node sample number, and t is a constant.
As a preferred solution of the fault signal normalization processing and classifying method based on machine learning according to the present invention, wherein: further comprising: and evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is larger than m.
As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the method comprises the following steps: defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:
Figure BDA0003531583950000031
wherein c (a) is a set evaluation value.
The invention has the beneficial effects that: the invention unifies data by utilizing standardized processing, reduces the processing time of the system, and improves the classification accuracy by utilizing the optimized decision tree model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
fig. 1 is a flowchart illustrating a method for processing and classifying fault signals based on machine learning according to a first embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and it will be appreciated by those skilled in the art that the present invention may be practiced without departing from the spirit and scope of the present invention and that the present invention is not limited by the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1, a first embodiment of the present invention provides a method for processing and classifying fault signals based on machine learning, which includes:
s1: and carrying out standardized processing on the historical fault information to obtain the parameters of the power grid equipment.
(1) Historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.
(2) And filling up vacancy values in the data through data cleaning, removing noise data, correcting inconsistent data, combining the data from different data sources through data integration to form uniform data storage, removing redundant attributes, searching and deleting repeated data, and obtaining various power grid equipment parameters for fault diagnosis through screening processing.
Preferably, the invention uses normalization processing to normalize the data uniformly and reduce noise, so that the data is more accurate and convenient to use.
S2: and identifying the fault category according to the power grid equipment parameters, and constructing a node branch.
(1) If the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch;
(2) when the power grid equipment parameters are discrete values and a decision tree is required to be generated, testing a subset divided by each fault category, and dividing the subset into two branches according to the condition that the subset belongs to and the subset does not belong to;
(3) when the power grid equipment parameter is a continuous value, a value is determined as a split point, and two different branches are generated according to the split points which are larger than and smaller than the split point.
Preferably, the invention classifies the data types by different data types, so that the model has universality.
S3: traversing the division mode of each node branch, dividing data into different sub-nodes, calculating the purity of the sub-nodes, selecting the sub-node with the highest purity as the optimal division mode, continuously dividing new sub-nodes until the purity of each node is highest, and constructing a decision tree.
(1) Purity h (x) includes:
Figure BDA0003531583950000051
n represents N different discrete values of X, PiRepresenting the probability of occurrence of event i.
(2) max (H (X)), the highest purity value is taken as the optimal division mode.
(3) The continuing of the partitioning of the new child nodes with the information gain comprises:
Gain(D,A)=H(D)-H(D|A)
where Gain (D, a) is an information Gain, and H (D) represents an information entropy H (D | a) of the data set D, and represents a conditional entropy of the characteristic attribute a.
Preferably, the present invention uses the purity and information gain values to continuously divide the sub-nodes of the decision tree to make the classification more accurate.
S4: and pruning the decision tree by utilizing the evaluation function to obtain the optimal decision tree.
(1) The evaluation function c (x) includes:
Figure BDA0003531583950000052
wherein n is the node sample number, and t is a constant.
(2) And evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is more than 5.
(3) Defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:
Figure BDA0003531583950000061
wherein c (a) is a set evaluation value.
Preferably, the method performs pruning to the classified decision tree by using the evaluation function to prevent overfitting, so that the decision tree is optimal.
Example 2
In order to verify the technical effect adopted in the method, the traditional BP neural network selected by the embodiment is compared with the method for comparison test, and the test result is compared by means of scientific demonstration to verify the real effect of the method.
A Data interface is established with an SCADA (Supervisory Control and Data Acquisition, Data Acquisition and monitoring Control system) system, 50 groups of historical fault information are obtained as sample Data, the sample Data are randomly divided into 30 groups of training samples and 20 groups of test samples, simulation tests of the two methods are realized by python programming, the two methods are respectively trained by the training samples, then the test samples are classified by the trained models, and experimental results are obtained, wherein the results are shown in the following table.
Table 1: precision comparison table.
Classification method Method for producing a composite material BP neural network
Precision of classification (%) 93.14% 91.68%
Time(s) 1.497 3.726
As can be seen from the above table, the measurement accuracy of the present solution is significantly higher than that of the conventional solution, and the time used is less.
It should be recognized that embodiments of the present invention can be realized and implemented in computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (9)

1. The fault signal standardization processing and classification method based on machine learning is characterized by comprising the following steps: the method comprises the following steps:
carrying out standardized processing on historical fault information to obtain power grid equipment parameters;
identifying fault types according to the power grid equipment parameters, and constructing node branches;
traversing the division mode of each node branch, dividing historical fault information into different sub-nodes, calculating the purities of the sub-nodes, and selecting the sub-node with the highest purity as the optimal division mode;
continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree;
and pruning the decision tree by utilizing an evaluation function to obtain an optimal decision tree.
2. The machine learning-based fault signal normalization processing and classification method of claim 1, wherein: the historical fault information includes: historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.
3. The method for processing and classifying fault signals based on machine learning according to any of claims 1 or 2, wherein: the normalization processing comprises:
filling vacancy values in the data through data cleaning, and removing noise data;
and merging data from different data sources through data integration to form unified data storage, removing redundant attributes, searching and deleting repeated data to obtain the power grid equipment parameters.
4. The machine learning-based fault signal normalization processing and classification method of claim 3, wherein: constructing the node branches includes:
if the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch;
if the power grid equipment parameters are discrete values and a decision tree is required to be generated, dividing a subset by fault categories, testing, and respectively constructing two branches according to the power grid equipment parameters which belong to the subset and do not belong to the subset;
if the power grid equipment parameters are continuous values, determining one value as a split point, and constructing two different branches according to the power grid equipment parameters which are larger than and smaller than the split point.
5. The machine learning-based fault signal normalization processing and classification method of claim 4, wherein: purity h (x) includes:
Figure FDA0003531583940000011
whereinN represents N different discrete values of the variable X, PiRepresenting the probability of occurrence of event i.
6. The machine learning-based fault signal normalization processing and classification method of claim 5, wherein: continuing to partition the new child nodes includes:
selecting a new child node of the data set by the information gain, selecting the division of the node with the largest information gain as:
Gain(D,A)=H(D)-H(D|A)
wherein Gain (D, a) is information Gain, H (D) is information entropy of the data set D, and H (D | a) is conditional entropy of the characteristic attribute a.
7. The machine learning-based fault signal normalization processing and classification method of claim 6, wherein: the evaluation function c (x) includes:
Figure FDA0003531583940000021
wherein n is the node sample number, and t is a constant.
8. The machine learning-based fault signal normalization processing and classification method of claim 7, wherein: further comprising: and evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is larger than m.
9. The machine learning-based fault signal normalization processing and classification method of claim 8, wherein: the method comprises the following steps:
defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:
Figure FDA0003531583940000022
where c (a) is a set evaluation value.
CN202210206997.0A 2022-03-04 2022-03-04 Fault signal standardization processing and classification method based on machine learning Pending CN114648060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210206997.0A CN114648060A (en) 2022-03-04 2022-03-04 Fault signal standardization processing and classification method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210206997.0A CN114648060A (en) 2022-03-04 2022-03-04 Fault signal standardization processing and classification method based on machine learning

Publications (1)

Publication Number Publication Date
CN114648060A true CN114648060A (en) 2022-06-21

Family

ID=81993612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210206997.0A Pending CN114648060A (en) 2022-03-04 2022-03-04 Fault signal standardization processing and classification method based on machine learning

Country Status (1)

Country Link
CN (1) CN114648060A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113234A (en) * 2023-10-12 2023-11-24 济南泉晓电气设备有限公司 Power transmission line fault detection method and system based on machine learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113234A (en) * 2023-10-12 2023-11-24 济南泉晓电气设备有限公司 Power transmission line fault detection method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN109033513B (en) Power transformer fault diagnosis method and power transformer fault diagnosis device
CN110046633B (en) Data quality detection method and device
CN112285807A (en) Meteorological information prediction method and device
CN117078048A (en) Digital twinning-based intelligent city resource management method and system
CN114676749A (en) Power distribution network operation data abnormity judgment method based on data mining
CN114648060A (en) Fault signal standardization processing and classification method based on machine learning
CN116306806A (en) Fault diagnosis model determining method and device and nonvolatile storage medium
CN116416884A (en) Testing device and testing method for display module
CN105303051A (en) Air pollutant concentration prediction method
CN112084294B (en) Whole vehicle electromagnetic compatibility grading management method based on artificial intelligence
CN112580780A (en) Model training processing method, device, equipment and storage medium
CN116739619A (en) Energy power carbon emission monitoring analysis modeling method and device
CN116738192A (en) Digital twinning-based security data evaluation method and system
CN113891342A (en) Base station inspection method and device, electronic equipment and storage medium
CN111428405A (en) Fine particle concentration simulation method and device, storage medium and electronic equipment
CN111950752A (en) Photovoltaic power station generating capacity prediction method, device and system and storage medium thereof
CN114139802B (en) Real-time optimization scheduling method based on basin water regime change trend analysis model
von Collani Theoretical stochastics
CN114819344A (en) Global space-time meteorological agricultural disaster prediction method based on key influence factors
CN115130546A (en) Abnormal parameter detection method and device, computer readable medium and electronic equipment
CN114565004A (en) Method and device for eliminating abnormal scattered points of power curve of wind turbine generator
CN112580781A (en) Processing method, device and equipment of deep learning model and storage medium
CN117725488B (en) Building engineering project safety performance prediction method and system based on machine learning
WO2024040801A1 (en) Transverse wave time difference prediction method and apparatus
CN113487080B (en) Wind speed dynamic scene generation method, system and terminal based on wind speed classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination