CN112364928A - Random forest classification method in transformer substation fault data diagnosis - Google Patents
Random forest classification method in transformer substation fault data diagnosis Download PDFInfo
- Publication number
- CN112364928A CN112364928A CN202011292591.6A CN202011292591A CN112364928A CN 112364928 A CN112364928 A CN 112364928A CN 202011292591 A CN202011292591 A CN 202011292591A CN 112364928 A CN112364928 A CN 112364928A
- Authority
- CN
- China
- Prior art keywords
- random forest
- data
- sample set
- original
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000003745 diagnosis Methods 0.000 title claims abstract description 22
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000003066 decision tree Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000013138 pruning Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
A random forest classification method in transformer substation fault data diagnosis extracts data from a transformer substation fault diagnosis system, preprocesses the data to obtain an original sample set, and comprises the following steps: (1) establishing a random forest model; (2) analyzing the importance of the original random forest model; (3) processing the original sample set, reserving the result and the selected characteristics, generating a new sample set, and simultaneously carrying out the same processing on the test set; (4) repeating the step (1) by using a new sample set to obtain a final random forest model; (5) testing the random forest model by using a test set, and evaluating the performance of the model; (6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database. The invention reduces a large amount of real-time data processing amount, accelerates the system classification speed and ensures the real-time performance of a decision-making system; the classification performance is good; over-fitting is avoided.
Description
Technical Field
The invention relates to a random forest classification method in transformer substation fault data diagnosis.
Background
In the prior art, when a power grid fails, a monitoring device generates alarm information in time and uploads the information, such as switch tripping, automatic protection device action, undervoltage, overcurrent, device overload and the like. Particularly, when some power systems with huge structures and scales have faults, a time system can generate a large amount of alarm information, and the information comprises a large amount of uncertain knowledge and data caused by factors such as protection or circuit breaker misoperation, refusal, channel transmission interference error, protection action time deviation and the like. At present, a plurality of transformer substation fault data diagnosis technologies and methods provided at home and abroad mainly comprise expert systems, artificial neural networks, optimization algorithm technologies, petri networks, fuzzy set theories, rough set theories and the like. The above intelligent technologies have different advantages when applied to fault diagnosis, but also expose many problems. For example, the expert system has high maintenance difficulty and poor fault tolerance; the artificial neural network lacks the capability of explaining the self behavior, and simultaneously needs a large number of training samples and the like. The existing transformer substation fault data diagnosis and classification method has the problems that the accuracy and the efficiency cannot be ensured at the same time, and the requirements on the diagnosis speed and the accuracy are high in the use of the actual transformer substation fault diagnosis system.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a random forest classification method in a substation fault data diagnosis project, which adopts an integrated learning idea on the basis of a decision tree, trains through randomly selected samples and randomly selected features to generate a random forest, and classifies data through the random forest.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a random forest classification method in a transformer substation fault diagnosis project extracts data from a transformer substation fault diagnosis system, and preprocesses the data to obtain an original sample set, wherein the method comprises the following steps:
(1) establishing a random forest model, wherein the process is as follows:
(1.1) setting T as an original sample set, wherein n samples are in total, and extracting n samples from the original sample set T in each round in a boosting (with back sampling) mode to obtain a training set T with the size of niIn the process of extracting the original sample set, there may be samples which are repeatedly extracted, or there may be samples which are not extracted once, and k rounds of extraction are performed, so that the training set of each round of extraction is divided intoIs other than T1,T2,…,TkThe data not contained is called out-of-bag data;
(1.2) establishing a decision tree;
(1.3) repeating the steps (1.1) and (1.2) until all CART trees are trained and all decision trees are combined to construct an original random forest model;
(2) performing importance analysis on an original random forest model, and designating L ═ sqrt (M) I to select L features before ranking;
(3) processing the original sample set T, reserving the result and the selected characteristics, generating a new sample set Y, and simultaneously carrying out the same processing on the test set;
(4) repeating the step (1) by using the new sample set Y to obtain a final random forest model H;
(5) testing the random forest model H by using a test set, and evaluating the performance of the model;
(6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database.
Further, the process of (1.2) is:
(1.2.1) let each sample have M features, and specify a number M ═ log2M |, satisfies the condition M<<M, randomly selecting M features from the M features at each internal node to form a new feature set DiFrom feature set DiSelecting an optimal attribute to split the nodes;
(1.2.2) each node was split according to (1.2.1) until no more splitting could be achieved, each tree was grown to maximum using the CART method without pruning.
Still further, the transformer substation fault diagnosis system is an SCADA or EMS system.
The working principle of the invention is as follows: the invention provides a random forest classification method in substation fault diagnosis. Acquiring data from a power grid company, and performing feature selection by using a Chinesian index minimization criterion in the process of establishing a decision tree to generate a binary tree; and establishing an original random forest model by using the original sample set, analyzing the feature importance of the original random forest model, screening out key features and processing the original sample set. Establishing a final random forest model by using the new sample set, thereby greatly reducing the data processing amount; and finally, obtaining a classification result by the random forest classification model through a voting rule.
The invention has the following beneficial effects: 1. the method reduces a large amount of real-time data processing amount, accelerates the system classification speed and ensures the real-time performance of the decision system. 2. The classification performance is good. 3. Over-fitting is avoided.
Drawings
Fig. 1 is a flowchart of a random forest classification method in a substation fault diagnosis project.
FIG. 2 is a two-level random forest classification system for substation fault data.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a random forest classification method in a substation fault diagnosis project includes the following steps:
the first step is as follows: extracting original data from systems such as SCADA and EMS.
The second step is that: carrying out data preprocessing on the original data to obtain an original sample set T, wherein the preprocessing comprises the following steps:
2.1) converting non-numeric data into numeric data
2.2) if the sample contains missing values, deleting the sample
2.3) if two or more samples exist, the attribute value and the category are all completely the same, only one is reserved, and the rest repeated samples are deleted
2.4) if there are two or more samples with identical attribute values but different categories, these invalid samples are deleted
The third step: t is an original sample set, wherein n samples are totally obtained, and then n samples are extracted from the original sample set T in each round in a mode of sampling back to obtain a training set T with the size of ni. During the extraction of the original sample set, there may be repeatedly extracted samples, or there may be samples that are not extracted at one timeSample drawn. Performing k rounds of extraction, and the training set of each round of extraction is T1,T2,…,TkThe data not contained is called the out-of-bag data, which serves as the test set for this random model.
The fourth step: according to the training set T1,T2,…,TkBuilding k decision trees
Each sample has M characteristics, a number M is assigned to | log2M |, and the condition M is met<<M, randomly selecting M features from the M features at each internal node to form a new feature set Di. From feature set DiAnd selecting an optimal attribute to split the nodes.
Each node is split according to the above steps until no more splits can be made. Each tree is grown to the maximum extent by using the CART algorithm without pruning.
The fifth step: and combining the k decision trees, wherein the weight of each decision tree is the same, and constructing an original random forest model.
And a sixth step: and (3) carrying out importance analysis on the original random forest model, and designating L ═ sqrt (M) I to select L features before ranking.
The seventh step: and processing the original sample set T, reserving the result and the selected features, generating a new sample set Y, and taking the data (the data outside the bag) which is not contained as the test data.
Eighth step: and repeating the steps (namely the third step to the fifth step) for establishing the random forest model by using the new sample set Y to obtain a final random forest model H.
The ninth step: and testing the random forest model H by using the test set, wherein the classification result is determined according to the voting amount of the tree classifier, and the obtained classification result is compared with the test set result to verify the reliability of the model.
The tenth step: and classifying the new data by using a random forest classifier, and storing the classification result into a database.
Referring to fig. 2, the two-layer random forest classification system in the substation fault data identification project implemented by the method mainly includes: a classification module and a user interaction module. The classification module classifies according to the model and calculates the classification accuracy; the user interaction module realizes data visualization display, Web interface configuration and application program configuration.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.
Claims (3)
1. A random forest classification method in transformer substation fault data diagnosis is characterized in that data are extracted from a transformer substation fault diagnosis system and preprocessed to obtain an original sample set, and the method comprises the following steps:
(1) establishing a random forest model, wherein the process is as follows:
(1.1) setting T as an original sample set, wherein n samples are in total, and extracting n samples from the original sample set T in each round in a Bootstrap manner to obtain a training set T with the size of niDuring the extraction of the original sample set, there may be repeatedly extracted samples, or there may be samples that are not extracted at one time. Performing k rounds of extraction, and the training set of each round of extraction is T1,T2,…,TkThe data not contained is called out-of-bag data;
(1.2) establishing a decision tree;
(1.3) repeating the steps (1.1) and (1.2) until all CART trees are trained and all decision trees are combined to construct an original random forest model;
(2) performing importance analysis on an original random forest model, and designating L ═ sqrt (M) I to select L features before ranking;
(3) processing the original sample set T, reserving the result and the selected characteristics, generating a new sample set Y, and simultaneously carrying out the same processing on the test set;
(4) repeating the step (1) by using the new sample set Y to obtain a final random forest model H;
(5) testing the random forest model H by using a test set, and evaluating the performance of the model;
(6) and (4) distinguishing and classifying the new data by using a random forest classifier, determining a classification result according to the voting amount of the tree classifier, and storing the classification result into a database.
2. A random forest classification method in substation data fault diagnosis according to claim 1, characterized in that the process of (1.2) is:
(1.2.1) let each sample have M features, and specify a number M ═ log2M |, satisfies the condition M<<M, randomly selecting M features from the M features at each internal node to form a new feature set DiFrom feature set DiSelecting an optimal attribute to split the nodes;
(1.2.2) each node was split according to (1.2.1) until no more splitting could be achieved, each tree was grown to maximum using the CART method without pruning.
3. A random forest classification method in substation data fault diagnosis according to claim 1 or 2, characterized in that the substation fault diagnosis system is a SCADA or EMS system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011292591.6A CN112364928A (en) | 2020-11-18 | 2020-11-18 | Random forest classification method in transformer substation fault data diagnosis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011292591.6A CN112364928A (en) | 2020-11-18 | 2020-11-18 | Random forest classification method in transformer substation fault data diagnosis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112364928A true CN112364928A (en) | 2021-02-12 |
Family
ID=74532720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011292591.6A Pending CN112364928A (en) | 2020-11-18 | 2020-11-18 | Random forest classification method in transformer substation fault data diagnosis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364928A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949714A (en) * | 2021-03-02 | 2021-06-11 | 北京城建设计发展集团股份有限公司 | Fault possibility estimation method based on random forest |
CN113095511A (en) * | 2021-04-16 | 2021-07-09 | 广东电网有限责任公司 | Method and device for judging in-place operation of automatic master station |
CN113362118A (en) * | 2021-07-08 | 2021-09-07 | 广东电网有限责任公司 | User electricity consumption behavior analysis method and system based on random forest |
CN113408068A (en) * | 2021-06-18 | 2021-09-17 | 浙江大学 | Random forest classification machine pump fault diagnosis method and device |
CN114154561A (en) * | 2021-11-15 | 2022-03-08 | 国家电网有限公司 | Electric power data management method based on natural language processing and random forest |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108680348A (en) * | 2018-05-14 | 2018-10-19 | 国网山东省电力公司莱芜供电公司 | A kind of breaker mechanical fault diagnosis method and system based on random forest |
CN109685051A (en) * | 2018-11-14 | 2019-04-26 | 国网上海市电力公司 | A kind of infrared image fault diagnosis system based on network system |
CN111461214A (en) * | 2020-03-31 | 2020-07-28 | 国网上海市电力公司 | Automatic fault diagnosis method for insulated pipe bus based on random forest algorithm |
-
2020
- 2020-11-18 CN CN202011292591.6A patent/CN112364928A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108680348A (en) * | 2018-05-14 | 2018-10-19 | 国网山东省电力公司莱芜供电公司 | A kind of breaker mechanical fault diagnosis method and system based on random forest |
CN109685051A (en) * | 2018-11-14 | 2019-04-26 | 国网上海市电力公司 | A kind of infrared image fault diagnosis system based on network system |
CN111461214A (en) * | 2020-03-31 | 2020-07-28 | 国网上海市电力公司 | Automatic fault diagnosis method for insulated pipe bus based on random forest algorithm |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949714A (en) * | 2021-03-02 | 2021-06-11 | 北京城建设计发展集团股份有限公司 | Fault possibility estimation method based on random forest |
CN113095511A (en) * | 2021-04-16 | 2021-07-09 | 广东电网有限责任公司 | Method and device for judging in-place operation of automatic master station |
CN113408068A (en) * | 2021-06-18 | 2021-09-17 | 浙江大学 | Random forest classification machine pump fault diagnosis method and device |
CN113362118A (en) * | 2021-07-08 | 2021-09-07 | 广东电网有限责任公司 | User electricity consumption behavior analysis method and system based on random forest |
CN114154561A (en) * | 2021-11-15 | 2022-03-08 | 国家电网有限公司 | Electric power data management method based on natural language processing and random forest |
CN114154561B (en) * | 2021-11-15 | 2024-02-27 | 国家电网有限公司 | Electric power data management method based on natural language processing and random forest |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112364928A (en) | Random forest classification method in transformer substation fault data diagnosis | |
CN110943857B (en) | Power communication network fault analysis and positioning method based on convolutional neural network | |
CN110705873B (en) | Power distribution network running state portrait analysis method | |
CN107274105B (en) | Linear discriminant analysis-based multi-attribute decision tree power grid stability margin evaluation method | |
CN109507535B (en) | Method and device for predicting operation stage and operation life of transformer substation grounding grid | |
CN110837866A (en) | XGboost-based electric power secondary equipment defect degree evaluation method | |
CN103902816A (en) | Electrification detection data processing method based on data mining technology | |
CN109753499A (en) | A kind of O&M monitoring data administering method | |
CN112217674B (en) | Alarm root cause identification method based on causal network mining and graph attention network | |
CN112464995A (en) | Power grid distribution transformer fault diagnosis method and system based on decision tree algorithm | |
CN108880706A (en) | A kind of fast diagnosis method of satellite channel link failure | |
CN112966385A (en) | Method and system for identifying topology weak points of power distribution network frame | |
CN113191585A (en) | Typhoon disaster risk assessment method for power transmission line | |
CN115293383A (en) | Game theory fused transformer risk cause analysis method | |
CN117614141B (en) | Multi-voltage-level coordination management method for power distribution network | |
CN104978837B (en) | A kind of warning system and its implementation of user oriented end electric substation | |
CN113570345A (en) | Power failure range automatic identification system based on construction project circuit diagram | |
CN112364929A (en) | Random forest classification method in power plant fault data diagnosis project | |
CN112000708A (en) | Abnormal data processing method and system based on regulation and control adapted data fusion | |
CN112149731A (en) | Power system fault classification method and system based on ID3 algorithm | |
CN115542070A (en) | Distribution network line fault positioning method and storage medium | |
CN114839858A (en) | Security control communication fault monitoring method, system, equipment and storage medium | |
Lin et al. | A method of satellite network fault synthetic diagnosis based on C4. 5 algorithm and expert knowledge database | |
CN114140662A (en) | Insulator lightning stroke image sample amplification method based on cyclic generation countermeasure network | |
CN115809761B (en) | Voltage quality analysis method and system based on low-voltage transformer area |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |