CN112364928A - Random forest classification method in transformer substation fault data diagnosis - Google Patents

Random forest classification method in transformer substation fault data diagnosis Download PDF

Info

Publication number
CN112364928A
CN112364928A CN202011292591.6A CN202011292591A CN112364928A CN 112364928 A CN112364928 A CN 112364928A CN 202011292591 A CN202011292591 A CN 202011292591A CN 112364928 A CN112364928 A CN 112364928A
Authority
CN
China
Prior art keywords
random forest
data
sample set
original
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011292591.6A
Other languages
Chinese (zh)
Inventor
蒋一波
冯缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011292591.6A priority Critical patent/CN112364928A/en
Publication of CN112364928A publication Critical patent/CN112364928A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A random forest classification method in transformer substation fault data diagnosis extracts data from a transformer substation fault diagnosis system, preprocesses the data to obtain an original sample set, and comprises the following steps: (1) establishing a random forest model; (2) analyzing the importance of the original random forest model; (3) processing the original sample set, reserving the result and the selected characteristics, generating a new sample set, and simultaneously carrying out the same processing on the test set; (4) repeating the step (1) by using a new sample set to obtain a final random forest model; (5) testing the random forest model by using a test set, and evaluating the performance of the model; (6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database. The invention reduces a large amount of real-time data processing amount, accelerates the system classification speed and ensures the real-time performance of a decision-making system; the classification performance is good; over-fitting is avoided.

Description

Random forest classification method in transformer substation fault data diagnosis
Technical Field
The invention relates to a random forest classification method in transformer substation fault data diagnosis.
Background
In the prior art, when a power grid fails, a monitoring device generates alarm information in time and uploads the information, such as switch tripping, automatic protection device action, undervoltage, overcurrent, device overload and the like. Particularly, when some power systems with huge structures and scales have faults, a time system can generate a large amount of alarm information, and the information comprises a large amount of uncertain knowledge and data caused by factors such as protection or circuit breaker misoperation, refusal, channel transmission interference error, protection action time deviation and the like. At present, a plurality of transformer substation fault data diagnosis technologies and methods provided at home and abroad mainly comprise expert systems, artificial neural networks, optimization algorithm technologies, petri networks, fuzzy set theories, rough set theories and the like. The above intelligent technologies have different advantages when applied to fault diagnosis, but also expose many problems. For example, the expert system has high maintenance difficulty and poor fault tolerance; the artificial neural network lacks the capability of explaining the self behavior, and simultaneously needs a large number of training samples and the like. The existing transformer substation fault data diagnosis and classification method has the problems that the accuracy and the efficiency cannot be ensured at the same time, and the requirements on the diagnosis speed and the accuracy are high in the use of the actual transformer substation fault diagnosis system.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a random forest classification method in a substation fault data diagnosis project, which adopts an integrated learning idea on the basis of a decision tree, trains through randomly selected samples and randomly selected features to generate a random forest, and classifies data through the random forest.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a random forest classification method in a transformer substation fault diagnosis project extracts data from a transformer substation fault diagnosis system, and preprocesses the data to obtain an original sample set, wherein the method comprises the following steps:
(1) establishing a random forest model, wherein the process is as follows:
(1.1) setting T as an original sample set, wherein n samples are in total, and extracting n samples from the original sample set T in each round in a boosting (with back sampling) mode to obtain a training set T with the size of niIn the process of extracting the original sample set, there may be samples which are repeatedly extracted, or there may be samples which are not extracted once, and k rounds of extraction are performed, so that the training set of each round of extraction is divided intoIs other than T1,T2,…,TkThe data not contained is called out-of-bag data;
(1.2) establishing a decision tree;
(1.3) repeating the steps (1.1) and (1.2) until all CART trees are trained and all decision trees are combined to construct an original random forest model;
(2) performing importance analysis on an original random forest model, and designating L ═ sqrt (M) I to select L features before ranking;
(3) processing the original sample set T, reserving the result and the selected characteristics, generating a new sample set Y, and simultaneously carrying out the same processing on the test set;
(4) repeating the step (1) by using the new sample set Y to obtain a final random forest model H;
(5) testing the random forest model H by using a test set, and evaluating the performance of the model;
(6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database.
Further, the process of (1.2) is:
(1.2.1) let each sample have M features, and specify a number M ═ log2M |, satisfies the condition M<<M, randomly selecting M features from the M features at each internal node to form a new feature set DiFrom feature set DiSelecting an optimal attribute to split the nodes;
(1.2.2) each node was split according to (1.2.1) until no more splitting could be achieved, each tree was grown to maximum using the CART method without pruning.
Still further, the transformer substation fault diagnosis system is an SCADA or EMS system.
The working principle of the invention is as follows: the invention provides a random forest classification method in substation fault diagnosis. Acquiring data from a power grid company, and performing feature selection by using a Chinesian index minimization criterion in the process of establishing a decision tree to generate a binary tree; and establishing an original random forest model by using the original sample set, analyzing the feature importance of the original random forest model, screening out key features and processing the original sample set. Establishing a final random forest model by using the new sample set, thereby greatly reducing the data processing amount; and finally, obtaining a classification result by the random forest classification model through a voting rule.
The invention has the following beneficial effects: 1. the method reduces a large amount of real-time data processing amount, accelerates the system classification speed and ensures the real-time performance of the decision system. 2. The classification performance is good. 3. Over-fitting is avoided.
Drawings
Fig. 1 is a flowchart of a random forest classification method in a substation fault diagnosis project.
FIG. 2 is a two-level random forest classification system for substation fault data.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a random forest classification method in a substation fault diagnosis project includes the following steps:
the first step is as follows: extracting original data from systems such as SCADA and EMS.
The second step is that: carrying out data preprocessing on the original data to obtain an original sample set T, wherein the preprocessing comprises the following steps:
2.1) converting non-numeric data into numeric data
2.2) if the sample contains missing values, deleting the sample
2.3) if two or more samples exist, the attribute value and the category are all completely the same, only one is reserved, and the rest repeated samples are deleted
2.4) if there are two or more samples with identical attribute values but different categories, these invalid samples are deleted
The third step: t is an original sample set, wherein n samples are totally obtained, and then n samples are extracted from the original sample set T in each round in a mode of sampling back to obtain a training set T with the size of ni. During the extraction of the original sample set, there may be repeatedly extracted samples, or there may be samples that are not extracted at one timeSample drawn. Performing k rounds of extraction, and the training set of each round of extraction is T1,T2,…,TkThe data not contained is called the out-of-bag data, which serves as the test set for this random model.
The fourth step: according to the training set T1,T2,…,TkBuilding k decision trees
Each sample has M characteristics, a number M is assigned to | log2M |, and the condition M is met<<M, randomly selecting M features from the M features at each internal node to form a new feature set Di. From feature set DiAnd selecting an optimal attribute to split the nodes.
Each node is split according to the above steps until no more splits can be made. Each tree is grown to the maximum extent by using the CART algorithm without pruning.
The fifth step: and combining the k decision trees, wherein the weight of each decision tree is the same, and constructing an original random forest model.
And a sixth step: and (3) carrying out importance analysis on the original random forest model, and designating L ═ sqrt (M) I to select L features before ranking.
The seventh step: and processing the original sample set T, reserving the result and the selected features, generating a new sample set Y, and taking the data (the data outside the bag) which is not contained as the test data.
Eighth step: and repeating the steps (namely the third step to the fifth step) for establishing the random forest model by using the new sample set Y to obtain a final random forest model H.
The ninth step: and testing the random forest model H by using the test set, wherein the classification result is determined according to the voting amount of the tree classifier, and the obtained classification result is compared with the test set result to verify the reliability of the model.
The tenth step: and classifying the new data by using a random forest classifier, and storing the classification result into a database.
Referring to fig. 2, the two-layer random forest classification system in the substation fault data identification project implemented by the method mainly includes: a classification module and a user interaction module. The classification module classifies according to the model and calculates the classification accuracy; the user interaction module realizes data visualization display, Web interface configuration and application program configuration.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims (3)

1. A random forest classification method in transformer substation fault data diagnosis is characterized in that data are extracted from a transformer substation fault diagnosis system and preprocessed to obtain an original sample set, and the method comprises the following steps:
(1) establishing a random forest model, wherein the process is as follows:
(1.1) setting T as an original sample set, wherein n samples are in total, and extracting n samples from the original sample set T in each round in a Bootstrap manner to obtain a training set T with the size of niDuring the extraction of the original sample set, there may be repeatedly extracted samples, or there may be samples that are not extracted at one time. Performing k rounds of extraction, and the training set of each round of extraction is T1,T2,…,TkThe data not contained is called out-of-bag data;
(1.2) establishing a decision tree;
(1.3) repeating the steps (1.1) and (1.2) until all CART trees are trained and all decision trees are combined to construct an original random forest model;
(2) performing importance analysis on an original random forest model, and designating L ═ sqrt (M) I to select L features before ranking;
(3) processing the original sample set T, reserving the result and the selected characteristics, generating a new sample set Y, and simultaneously carrying out the same processing on the test set;
(4) repeating the step (1) by using the new sample set Y to obtain a final random forest model H;
(5) testing the random forest model H by using a test set, and evaluating the performance of the model;
(6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database.
2. A random forest classification method in substation data fault diagnosis according to claim 1, characterized in that the process of (1.2) is:
(1.2.1) let each sample have M features, and specify a number M ═ log2M |, satisfies the condition M<<M, randomly selecting M features from the M features at each internal node to form a new feature set DiFrom feature set DiSelecting an optimal attribute to split the nodes;
(1.2.2) each node was split according to (1.2.1) until no more splitting could be achieved, each tree was grown to maximum using the CART method without pruning.
3. A random forest classification method in substation data fault diagnosis according to claim 1 or 2, characterized in that the substation fault diagnosis system is a SCADA or EMS system.
CN202011292591.6A 2020-11-18 2020-11-18 Random forest classification method in transformer substation fault data diagnosis Pending CN112364928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011292591.6A CN112364928A (en) 2020-11-18 2020-11-18 Random forest classification method in transformer substation fault data diagnosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011292591.6A CN112364928A (en) 2020-11-18 2020-11-18 Random forest classification method in transformer substation fault data diagnosis

Publications (1)

Publication Number Publication Date
CN112364928A true CN112364928A (en) 2021-02-12

Family

ID=74532720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011292591.6A Pending CN112364928A (en) 2020-11-18 2020-11-18 Random forest classification method in transformer substation fault data diagnosis

Country Status (1)

Country Link
CN (1) CN112364928A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949714A (en) * 2021-03-02 2021-06-11 北京城建设计发展集团股份有限公司 Fault possibility estimation method based on random forest
CN113095511A (en) * 2021-04-16 2021-07-09 广东电网有限责任公司 Method and device for judging in-place operation of automatic master station
CN113362118A (en) * 2021-07-08 2021-09-07 广东电网有限责任公司 User electricity consumption behavior analysis method and system based on random forest
CN113408068A (en) * 2021-06-18 2021-09-17 浙江大学 Random forest classification machine pump fault diagnosis method and device
CN114154561A (en) * 2021-11-15 2022-03-08 国家电网有限公司 Electric power data management method based on natural language processing and random forest

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108680348A (en) * 2018-05-14 2018-10-19 国网山东省电力公司莱芜供电公司 A kind of breaker mechanical fault diagnosis method and system based on random forest
CN109685051A (en) * 2018-11-14 2019-04-26 国网上海市电力公司 A kind of infrared image fault diagnosis system based on network system
CN111461214A (en) * 2020-03-31 2020-07-28 国网上海市电力公司 Automatic fault diagnosis method for insulated pipe bus based on random forest algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108680348A (en) * 2018-05-14 2018-10-19 国网山东省电力公司莱芜供电公司 A kind of breaker mechanical fault diagnosis method and system based on random forest
CN109685051A (en) * 2018-11-14 2019-04-26 国网上海市电力公司 A kind of infrared image fault diagnosis system based on network system
CN111461214A (en) * 2020-03-31 2020-07-28 国网上海市电力公司 Automatic fault diagnosis method for insulated pipe bus based on random forest algorithm

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949714A (en) * 2021-03-02 2021-06-11 北京城建设计发展集团股份有限公司 Fault possibility estimation method based on random forest
CN113095511A (en) * 2021-04-16 2021-07-09 广东电网有限责任公司 Method and device for judging in-place operation of automatic master station
CN113408068A (en) * 2021-06-18 2021-09-17 浙江大学 Random forest classification machine pump fault diagnosis method and device
CN113362118A (en) * 2021-07-08 2021-09-07 广东电网有限责任公司 User electricity consumption behavior analysis method and system based on random forest
CN114154561A (en) * 2021-11-15 2022-03-08 国家电网有限公司 Electric power data management method based on natural language processing and random forest
CN114154561B (en) * 2021-11-15 2024-02-27 国家电网有限公司 Electric power data management method based on natural language processing and random forest

Similar Documents

Publication Publication Date Title
CN112364928A (en) Random forest classification method in transformer substation fault data diagnosis
CN110943857B (en) Power communication network fault analysis and positioning method based on convolutional neural network
CN110705873B (en) Power distribution network running state portrait analysis method
CN107274105B (en) Linear discriminant analysis-based multi-attribute decision tree power grid stability margin evaluation method
CN109507535B (en) Method and device for predicting operation stage and operation life of transformer substation grounding grid
CN110837866A (en) XGboost-based electric power secondary equipment defect degree evaluation method
CN103902816A (en) Electrification detection data processing method based on data mining technology
CN109753499A (en) A kind of O&amp;M monitoring data administering method
CN112217674B (en) Alarm root cause identification method based on causal network mining and graph attention network
CN112464995A (en) Power grid distribution transformer fault diagnosis method and system based on decision tree algorithm
CN108880706A (en) A kind of fast diagnosis method of satellite channel link failure
CN112966385A (en) Method and system for identifying topology weak points of power distribution network frame
CN113191585A (en) Typhoon disaster risk assessment method for power transmission line
CN115293383A (en) Game theory fused transformer risk cause analysis method
CN117614141B (en) Multi-voltage-level coordination management method for power distribution network
CN104978837B (en) A kind of warning system and its implementation of user oriented end electric substation
CN113570345A (en) Power failure range automatic identification system based on construction project circuit diagram
CN112364929A (en) Random forest classification method in power plant fault data diagnosis project
CN112000708A (en) Abnormal data processing method and system based on regulation and control adapted data fusion
CN112149731A (en) Power system fault classification method and system based on ID3 algorithm
CN115542070A (en) Distribution network line fault positioning method and storage medium
CN114839858A (en) Security control communication fault monitoring method, system, equipment and storage medium
Lin et al. A method of satellite network fault synthetic diagnosis based on C4. 5 algorithm and expert knowledge database
CN114140662A (en) Insulator lightning stroke image sample amplification method based on cyclic generation countermeasure network
CN115809761B (en) Voltage quality analysis method and system based on low-voltage transformer area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination