CN114648060A

CN114648060A - Fault signal standardization processing and classification method based on machine learning

Info

Publication number: CN114648060A
Application number: CN202210206997.0A
Authority: CN
Inventors: 肖辅盛; 吴俊杰; 罗宇; 刘亮; 戴雯菊; 李一荻; 张恂; 黄宇; 金宇; 黄晓旭; 夏盛海; 沈云春; 穆萍; 杨攀; 卢昊; 蒋猛
Original assignee: Guizhou Power Grid Co Ltd
Current assignee: Guizhou Power Grid Co Ltd
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-21

Abstract

The invention discloses a fault signal standardized processing and classifying method based on machine learning, which comprises the steps of carrying out standardized processing on historical fault information to obtain power grid equipment parameters; identifying fault types according to the parameters of the power grid equipment, and constructing node branches; traversing the division mode of each node branch, dividing historical fault information into different child nodes, calculating the purities of the child nodes, and selecting the child node with the highest purity as an optimal division mode; continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree; pruning the decision tree by utilizing the evaluation function to obtain an optimal decision tree; the method is based on the decision tree and optimizes to realize accurate classification of fault categories, and is less in time consumption.

Description

Fault signal standardization processing and classification method based on machine learning

Technical Field

The invention relates to the technical field of signal classification, in particular to a fault signal standardization processing and classifying method based on machine learning.

Background

In recent years, an expert system is an intelligent technology which is applied to the field of power grid fault diagnosis at the earliest, is relatively mature on the whole, and has relatively strong reasoning capability and fault interpretation capability on deterministic information, however, in the case of power grids with increasingly complex scales and topological structures, the construction of a rule base is difficult to complete, and when the structure of the power grid changes, the rule base needs to be updated correspondingly, so that the maintenance difficulty is very high, as the element fault and the alarm information for representing the fault do not have a one-to-one mapping relation, the fault symptom aliasing phenomenon caused under the condition of uncertain information is very serious, and under the background, the fault is difficult to identify accurately, and needs to be further improved.

The traditional technical scheme is as follows: the Bayesian network has outstanding advantages in power grid alarm diagnosis, combines graph theory and probability theory, expresses knowledge by using a network flow graph, describes the influence among different knowledge components by using the probability theory, and realizes power grid fault diagnosis through conditional probability reasoning; however, in practical application, a large-scale power system has complex faults and non-uniform data, and relatively difficult prior probability acquisition, which affects diagnosis efficiency.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and title of the application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the invention provides a fault signal normalization processing and classification method based on machine learning, which can avoid the problem of inaccurate classification caused by excessive data and types.

In order to solve the technical problems, the invention provides the following technical scheme: the method comprises the following steps: carrying out standardized processing on historical fault information to obtain power grid equipment parameters; identifying fault types according to the power grid equipment parameters, and constructing node branches; traversing the division mode of each node branch, dividing historical fault information into different sub-nodes, calculating the purities of the sub-nodes, and selecting the sub-node with the highest purity as the optimal division mode; continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree; and pruning the decision tree by utilizing an evaluation function to obtain an optimal decision tree.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the historical fault information includes: historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the normalization processing comprises: filling vacancy values in the data through data cleaning, and removing noise data; and merging data from different data sources through data integration to form unified data storage, removing redundant attributes, searching and deleting repeated data to obtain the power grid equipment parameters.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: constructing the node branches includes: if the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch; if the power grid equipment parameters are discrete values and the decision tree is required to be generated, dividing a subset by fault categories, testing, and respectively constructing two branches according to the power grid equipment parameters belonging to the subset and not belonging to the subset; if the power grid equipment parameter is a continuous value, determining a value as a split point, and constructing two different branches according to the power grid equipment parameters greater than and less than the split point.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: purity h (x) includes:

wherein N represents N different discrete values of the variable X, P_iRepresenting the probability of occurrence of event i.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: continuing to partition the new child nodes includes: selecting a new child node of the data set by the information gain, selecting the division of the node with the largest information gain as:

Gain(D,A)＝H(D)-H(D|A)

wherein Gain (D, a) is information Gain, H (D) is information entropy of the data set D, and H (D | a) is conditional entropy of the characteristic attribute a.

As a preferred solution of the fault signal normalization processing and classifying method based on machine learning according to the present invention, wherein: the evaluation function c (x) includes:

wherein n is the node sample number, and t is a constant.

As a preferred solution of the fault signal normalization processing and classifying method based on machine learning according to the present invention, wherein: further comprising: and evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is larger than m.

As a preferred solution of the method for processing and classifying fault signals based on machine learning according to the present invention, wherein: the method comprises the following steps: defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:

wherein c (a) is a set evaluation value.

The invention has the beneficial effects that: the invention unifies data by utilizing standardized processing, reduces the processing time of the system, and improves the classification accuracy by utilizing the optimized decision tree model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

fig. 1 is a flowchart illustrating a method for processing and classifying fault signals based on machine learning according to a first embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and it will be appreciated by those skilled in the art that the present invention may be practiced without departing from the spirit and scope of the present invention and that the present invention is not limited by the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1, a first embodiment of the present invention provides a method for processing and classifying fault signals based on machine learning, which includes:

s1: and carrying out standardized processing on the historical fault information to obtain the parameters of the power grid equipment.

(1) Historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.

(2) And filling up vacancy values in the data through data cleaning, removing noise data, correcting inconsistent data, combining the data from different data sources through data integration to form uniform data storage, removing redundant attributes, searching and deleting repeated data, and obtaining various power grid equipment parameters for fault diagnosis through screening processing.

Preferably, the invention uses normalization processing to normalize the data uniformly and reduce noise, so that the data is more accurate and convenient to use.

S2: and identifying the fault category according to the power grid equipment parameters, and constructing a node branch.

(1) If the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch;

(2) when the power grid equipment parameters are discrete values and a decision tree is required to be generated, testing a subset divided by each fault category, and dividing the subset into two branches according to the condition that the subset belongs to and the subset does not belong to;

(3) when the power grid equipment parameter is a continuous value, a value is determined as a split point, and two different branches are generated according to the split points which are larger than and smaller than the split point.

Preferably, the invention classifies the data types by different data types, so that the model has universality.

S3: traversing the division mode of each node branch, dividing data into different sub-nodes, calculating the purity of the sub-nodes, selecting the sub-node with the highest purity as the optimal division mode, continuously dividing new sub-nodes until the purity of each node is highest, and constructing a decision tree.

(1) Purity h (x) includes:

n represents N different discrete values of X, P_iRepresenting the probability of occurrence of event i.

(2) max (H (X)), the highest purity value is taken as the optimal division mode.

(3) The continuing of the partitioning of the new child nodes with the information gain comprises:

Gain(D,A)＝H(D)-H(D|A)

where Gain (D, a) is an information Gain, and H (D) represents an information entropy H (D | a) of the data set D, and represents a conditional entropy of the characteristic attribute a.

Preferably, the present invention uses the purity and information gain values to continuously divide the sub-nodes of the decision tree to make the classification more accurate.

S4: and pruning the decision tree by utilizing the evaluation function to obtain the optimal decision tree.

(1) The evaluation function c (x) includes:

wherein n is the node sample number, and t is a constant.

(2) And evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is more than 5.

(3) Defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:

wherein c (a) is a set evaluation value.

Preferably, the method performs pruning to the classified decision tree by using the evaluation function to prevent overfitting, so that the decision tree is optimal.

Example 2

In order to verify the technical effect adopted in the method, the traditional BP neural network selected by the embodiment is compared with the method for comparison test, and the test result is compared by means of scientific demonstration to verify the real effect of the method.

A Data interface is established with an SCADA (Supervisory Control and Data Acquisition, Data Acquisition and monitoring Control system) system, 50 groups of historical fault information are obtained as sample Data, the sample Data are randomly divided into 30 groups of training samples and 20 groups of test samples, simulation tests of the two methods are realized by python programming, the two methods are respectively trained by the training samples, then the test samples are classified by the trained models, and experimental results are obtained, wherein the results are shown in the following table.

Table 1: precision comparison table.

Classification method	Method for producing a composite material	BP neural network
			Precision of classification (%)	93.14％	91.68％
Time(s)	1.497	3.726

As can be seen from the above table, the measurement accuracy of the present solution is significantly higher than that of the conventional solution, and the time used is less.

It should be recognized that embodiments of the present invention can be realized and implemented in computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. The fault signal standardization processing and classification method based on machine learning is characterized by comprising the following steps: the method comprises the following steps:

carrying out standardized processing on historical fault information to obtain power grid equipment parameters;

identifying fault types according to the power grid equipment parameters, and constructing node branches;

traversing the division mode of each node branch, dividing historical fault information into different sub-nodes, calculating the purities of the sub-nodes, and selecting the sub-node with the highest purity as the optimal division mode;

continuously dividing new child nodes until the purity of each node is highest, and constructing a decision tree;

and pruning the decision tree by utilizing an evaluation function to obtain an optimal decision tree.

2. The machine learning-based fault signal normalization processing and classification method of claim 1, wherein: the historical fault information includes: historical fault information is acquired by establishing a data interface with a data acquisition and monitoring control system.

3. The method for processing and classifying fault signals based on machine learning according to any of claims 1 or 2, wherein: the normalization processing comprises:

filling vacancy values in the data through data cleaning, and removing noise data;

and merging data from different data sources through data integration to form unified data storage, removing redundant attributes, searching and deleting repeated data to obtain the power grid equipment parameters.

4. The machine learning-based fault signal normalization processing and classification method of claim 3, wherein: constructing the node branches includes:

if the power grid equipment parameters are discrete values and a decision tree is not generated, constructing each fault category as a branch;

if the power grid equipment parameters are discrete values and a decision tree is required to be generated, dividing a subset by fault categories, testing, and respectively constructing two branches according to the power grid equipment parameters which belong to the subset and do not belong to the subset;

if the power grid equipment parameters are continuous values, determining one value as a split point, and constructing two different branches according to the power grid equipment parameters which are larger than and smaller than the split point.

5. The machine learning-based fault signal normalization processing and classification method of claim 4, wherein: purity h (x) includes:

whereinN represents N different discrete values of the variable X, P_iRepresenting the probability of occurrence of event i.

6. The machine learning-based fault signal normalization processing and classification method of claim 5, wherein: continuing to partition the new child nodes includes:

selecting a new child node of the data set by the information gain, selecting the division of the node with the largest information gain as:

Gain(D,A)＝H(D)-H(D|A)

7. The machine learning-based fault signal normalization processing and classification method of claim 6, wherein: the evaluation function c (x) includes:

wherein n is the node sample number, and t is a constant.

8. The machine learning-based fault signal normalization processing and classification method of claim 7, wherein: further comprising: and evaluating according to the weighted sum of the entropy values of the leaf nodes, and pruning to eliminate overfitting when the evaluation value is larger than m.

9. The machine learning-based fault signal normalization processing and classification method of claim 8, wherein: the method comprises the following steps:

defining a pruning coefficient R (X) and expressing the reduction degree of the integral loss function after pruning:

where c (a) is a set evaluation value.