CN113284004A - Power data diagnosis treatment method based on isolated forest algorithm - Google Patents

Power data diagnosis treatment method based on isolated forest algorithm Download PDF

Info

Publication number
CN113284004A
CN113284004A CN202110506063.4A CN202110506063A CN113284004A CN 113284004 A CN113284004 A CN 113284004A CN 202110506063 A CN202110506063 A CN 202110506063A CN 113284004 A CN113284004 A CN 113284004A
Authority
CN
China
Prior art keywords
data
abnormal
isolated
forest algorithm
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110506063.4A
Other languages
Chinese (zh)
Inventor
李保平
谢超
王辉
尉建兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huitong Guoxin Technology Co Ltd
Original Assignee
Guangzhou Huitong Guoxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huitong Guoxin Technology Co Ltd filed Critical Guangzhou Huitong Guoxin Technology Co Ltd
Priority to CN202110506063.4A priority Critical patent/CN113284004A/en
Publication of CN113284004A publication Critical patent/CN113284004A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention relates to a power data diagnosis treatment method based on an isolated forest algorithm, which comprises the following steps: collecting data: collecting the operation data of the wind generating set by using an SCADA (supervisory control and data acquisition); data preprocessing: preprocessing the acquired data to form a standard data set; and (3) isolated forest algorithm analysis: analyzing the data set by using an isolated forest algorithm, isolating abnormal data and reporting the abnormal data to a control system; and (4) judging expert knowledge, namely intelligently analyzing the reported abnormal data by using a judging system fused with expert experience information so as to judge whether the running state is abnormal or not, and sending a judgment result to an operator on duty. The beneficial effects are that: the automatic and timely treatment method is realized, and the analysis by adopting the isolated forest algorithm has the advantages of high processing speed and high processing accuracy.

Description

Power data diagnosis treatment method based on isolated forest algorithm
Technical Field
The invention relates to the technical field of power data processing, in particular to a power data diagnosis treatment method based on an isolated forest algorithm.
Background
An scada (supervisory Control And Data acquisition) system, i.e. a Data acquisition And monitoring Control system. The SCADA system is a DCS and electric power automatic monitoring system based on a computer; the method has wide application field, and can be applied to a plurality of fields such as data acquisition and monitoring control, process control and the like in the fields of electric power, metallurgy, petroleum, chemical industry, gas, railways and the like.
The SCADA measurement information of the wind generating set is not matched with the normal operation condition of the wind generating set. The unit is in abnormal operation states such as starting and stopping, but the duration is very short and is not correctly identified by the controller; the problems of range deviation and multiplication of scaling proportion occur in the measurement process of the sensor, measurement data are abnormal, and due to the fact that the data volume is large, the requirement for automatic and timely treatment cannot be met depending on manpower, and therefore an automatic and timely treatment method needs to be researched.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provides a power data diagnosis governing method based on an isolated forest algorithm.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
a power data diagnosis treatment method based on an isolated forest algorithm comprises the following steps:
collecting data: collecting the operation data of the wind generating set by using an SCADA (supervisory control and data acquisition);
data preprocessing: preprocessing the acquired data to form a standard data set;
and (3) isolated forest algorithm analysis: analyzing the data set by using an isolated forest algorithm, isolating abnormal data and reporting the abnormal data to a control system;
and (4) judging expert knowledge, namely intelligently analyzing the reported abnormal data by using a judging system fused with expert experience information so as to judge whether the running state is abnormal or not, and sending a judgment result to an operator on duty.
Wherein the data preprocessing comprises: cleaning the key characteristic field; the empty value is manually supplemented, and the empty value which cannot be supplemented is removed; carrying out manual modification on the abnormal value, and removing the abnormal value which cannot be modified; and calculating and forming the features to be refined according to a feature calculation formula, and standardizing the features to form a standard data set.
The isolated forest algorithm analysis comprises isolated tree training, integration of all isolated tree results and isolation of abnormal data and reporting of the abnormal data to a control system.
Wherein the training of the isolated tree specifically comprises:
s11, randomly selecting psi data from the data set as subsamples and putting the subsamples into a root node of an isolated tree;
s12, randomly appointing a dimension, randomly generating a cutting point p in the data range of the current node, and generating the cutting point p between the maximum value and the minimum value of the appointed dimension in the data of the current node;
s13, selecting the cutting point p to generate a hyperplane, and dividing the data space of the current node into 2 subspaces: placing points smaller than p in the currently selected dimension on the left branch of the current node, and placing points larger than or equal to p on the right branch of the current node;
s14, recursion steps S12 and S13 at the left and right branch nodes of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more or the isolated tree has grown to the set height.
Wherein, the integrating all the isolated tree results specifically comprises:
s21, since the cutting process is completely random, the ensembles method is needed to converge the result, i.e. cut from the beginning repeatedly, and then calculate the average value of each cut result.
S22, after t isolated trees are obtained, training of a single isolated tree is finished, and then the generated isolated tree can be used to evaluate test data, that is, the anomaly score S is calculated, for each sample x, the result of each tree needs to be calculated comprehensively, and the anomaly score is calculated by the following formula:
Figure 771919DEST_PATH_IMAGE001
h (x) is the height of x in each tree, c (Ψ) is the average of the path lengths at a given number of samples Ψ, and is used to normalize the path length h (x) of sample x;
s23, judging abnormal data:
if the anomaly score is close to 1, then it must be anomalous data;
if the anomaly score is much less than 0.5, then it must not be anomalous data;
if the scores of all points for an anomaly are around 0.5, then there is likely no anomalous data in the sample.
Wherein the expert knowledge discrimination specifically comprises:
s31, collecting historical operation abnormal data, calculating according to a characteristic calculation formula to form characteristics to be extracted, carrying out standardization processing on the characteristics, and then combining the historical operation abnormal data, the data characteristics after the standardization processing and abnormal data judgment results to construct an expert knowledge base;
s32, searching the standardized processing characteristics of the abnormal data analyzed by the isolated forest algorithm in an expert knowledge base, and collecting all search results;
s33, comparing the abnormal data analyzed by the isolated forest algorithm with the search results in the step S32 one by one, and finding out the same or similar search results;
and S34, sending the searched search result as a judgment result to the operator on duty.
The invention has the beneficial effects that: the method comprises the steps that operation data of the wind generating set are automatically acquired through an SCADA (supervisory control and data acquisition), a standard data set is formed after data processing, then, an isolated forest algorithm is used for analysis, abnormal data are isolated and reported to a control system, and finally, a judgment system fusing expert experience information is used for intelligently analyzing the reported abnormal data, so that whether the operation state is abnormal or not is judged, and a judgment result is sent to an operator on duty; therefore, the automatic and timely treatment method is realized, and the analysis by adopting the isolated forest algorithm has the advantages of high processing speed and high processing accuracy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a flow chart in an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, a governing method of power data diagnosis based on isolated forest algorithm includes the following steps:
collecting data: and collecting the operation data of the wind generating set by using the SCADA.
Data preprocessing, preprocessing the acquired data to form a standard data set, and specifically comprises the following steps: cleaning the key characteristic field; the empty value is manually supplemented, and the empty value which cannot be supplemented is removed; carrying out manual modification on the abnormal value, and removing the abnormal value which cannot be modified; and calculating and forming the features to be refined according to a feature calculation formula, and standardizing the features to form a standard data set. The data sets are separated according to time periods, for example, the operation data of the wind generating set collected in the time period of 0-1 point of a certain day of a certain month in a certain year are preprocessed to form a standard data set, and the length and the end point of the time period can be set manually.
And (4) analyzing the isolated forest algorithm, namely analyzing the data set by using the isolated forest algorithm, isolating abnormal data and reporting the abnormal data to the control system, wherein the isolated forest algorithm has the advantages of high processing speed and high processing accuracy. The isolated forest algorithm analysis comprises isolated tree training, integration of all isolated tree results and isolation of abnormal data and reporting of the abnormal data to a control system.
The training of the isolated tree specifically comprises the following steps:
s11, randomly selecting psi data from the data set as subsamples and putting the subsamples into a root node of an isolated tree;
s12, randomly appointing a dimension, randomly generating a cutting point p in the data range of the current node, and generating the cutting point p between the maximum value and the minimum value of the appointed dimension in the data of the current node;
s13, selecting the cutting point p to generate a hyperplane, and dividing the data space of the current node into 2 subspaces: placing points smaller than p in the currently selected dimension on the left branch of the current node, and placing points larger than or equal to p on the right branch of the current node;
s14, recursion steps S12 and S13 at the left and right branch nodes of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more or the isolated tree has grown to the set height.
The step of integrating all the isolated tree results specifically comprises the following steps:
s21, since the cutting process is completely random, the ensembles method is needed to converge the result, i.e. cut from the beginning repeatedly, and then calculate the average value of each cut result.
S22, after t isolated trees are obtained, training of a single isolated tree is finished, and then the generated isolated tree can be used to evaluate test data, that is, the anomaly score S is calculated, for each sample x, the result of each tree needs to be calculated comprehensively, and the anomaly score is calculated by the following formula:
Figure 82815DEST_PATH_IMAGE002
h (x) is the height of x in each tree, c (Ψ) is the average of the path lengths at a given number of samples Ψ, and is used to normalize the path length h (x) of sample x;
s23, judging abnormal data: if the anomaly score is close to 1, then it must be anomalous data; if the anomaly score is much less than 0.5, then it must not be anomalous data; if the scores of all points for an anomaly are around 0.5, then there is likely no anomalous data in the sample.
And (3) judging expert knowledge, namely intelligently analyzing the reported abnormal data by using a judging system fused with expert experience information so as to judge whether the running state is abnormal or not, and sending a judgment result to an operator on duty to realize an automatic and timely treatment method, wherein the judgment of the expert knowledge specifically comprises the following steps:
s31, collecting historical operation abnormal data, calculating according to a characteristic calculation formula to form characteristics to be extracted, carrying out standardization processing on the characteristics, and then combining the historical operation abnormal data, the data characteristics after the standardization processing and abnormal data judgment results to construct an expert knowledge base;
s32, searching the standardized processing characteristics of the abnormal data analyzed by the isolated forest algorithm in an expert knowledge base, and collecting all search results;
s33, comparing the abnormal data analyzed by the isolated forest algorithm with the search results in the step S32 one by one, and finding out the same or similar search results;
and S34, sending the searched search result as a judgment result to the operator on duty.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (6)

1. A power data diagnosis governing method based on an isolated forest algorithm is characterized by comprising the following steps:
collecting data: collecting the operation data of the wind generating set by using an SCADA (supervisory control and data acquisition);
data preprocessing: preprocessing the acquired data to form a standard data set;
and (3) isolated forest algorithm analysis: analyzing the data set by using an isolated forest algorithm, isolating abnormal data and reporting the abnormal data to a control system;
and (4) judging expert knowledge, namely intelligently analyzing the reported abnormal data by using a judging system fused with expert experience information so as to judge whether the running state is abnormal or not, and sending a judgment result to an operator on duty.
2. The abatement method of claim 1, wherein the data pre-processing comprises: cleaning the key characteristic field; the empty value is manually supplemented, and the empty value which cannot be supplemented is removed; carrying out manual modification on the abnormal value, and removing the abnormal value which cannot be modified; and calculating and forming the features to be refined according to a feature calculation formula, and standardizing the features to form a standard data set.
3. A remediation method according to claim 1 wherein: the isolated forest algorithm analysis comprises isolated tree training, integration of all isolated tree results and isolation of abnormal data and reporting of the abnormal data to a control system.
4. The governance method according to claim 3, wherein the orphan tree training specifically comprises:
s11, randomly selecting psi data from the data set as subsamples and putting the subsamples into a root node of an isolated tree;
s12, randomly appointing a dimension, randomly generating a cutting point p in the data range of the current node, and generating the cutting point p between the maximum value and the minimum value of the appointed dimension in the data of the current node;
s13, selecting the cutting point p to generate a hyperplane, and dividing the data space of the current node into 2 subspaces: placing points smaller than p in the currently selected dimension on the left branch of the current node, and placing points larger than or equal to p on the right branch of the current node;
s14, recursion steps S12 and S13 at the left and right branch nodes of the node, new leaf nodes are continuously constructed until only one piece of data on the leaf nodes can not be cut any more or the isolated tree has grown to the set height.
5. The governance method according to claim 4, wherein the integrating all orphan tree results specifically comprises:
s21, since the cutting process is completely random, the ensembles method is needed to converge the result, i.e. cut from the beginning repeatedly, and then calculate the average value of each cut result.
S22, after t isolated trees are obtained, training of a single isolated tree is finished, and then the generated isolated tree can be used to evaluate test data, that is, the anomaly score S is calculated, for each sample x, the result of each tree needs to be calculated comprehensively, and the anomaly score is calculated by the following formula:
Figure 972607DEST_PATH_IMAGE001
h (x) is the height of x in each tree, c (Ψ) is the average of the path lengths at a given number of samples Ψ, and is used to normalize the path length h (x) of sample x;
s23, judging abnormal data:
if the anomaly score is close to 1, then it must be anomalous data;
if the anomaly score is much less than 0.5, then it must not be anomalous data;
if the scores of all points for an anomaly are around 0.5, then there is likely no anomalous data in the sample.
6. The abatement method of claim 1, wherein the expert knowledge discrimination specifically comprises:
s31, collecting historical operation abnormal data, calculating according to a characteristic calculation formula to form characteristics to be extracted, carrying out standardization processing on the characteristics, and then combining the historical operation abnormal data, the data characteristics after the standardization processing and abnormal data judgment results to construct an expert knowledge base;
s32, searching the standardized processing characteristics of the abnormal data analyzed by the isolated forest algorithm in an expert knowledge base, and collecting all search results;
s33, comparing the abnormal data analyzed by the isolated forest algorithm with the search results in the step S32 one by one, and finding out the same or similar search results;
and S34, sending the searched search result as a judgment result to the operator on duty.
CN202110506063.4A 2021-05-10 2021-05-10 Power data diagnosis treatment method based on isolated forest algorithm Pending CN113284004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110506063.4A CN113284004A (en) 2021-05-10 2021-05-10 Power data diagnosis treatment method based on isolated forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110506063.4A CN113284004A (en) 2021-05-10 2021-05-10 Power data diagnosis treatment method based on isolated forest algorithm

Publications (1)

Publication Number Publication Date
CN113284004A true CN113284004A (en) 2021-08-20

Family

ID=77278402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110506063.4A Pending CN113284004A (en) 2021-05-10 2021-05-10 Power data diagnosis treatment method based on isolated forest algorithm

Country Status (1)

Country Link
CN (1) CN113284004A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964216A (en) * 2023-01-30 2023-04-14 北京慧图科技(集团)股份有限公司 Internet of things equipment data anomaly detection method based on isolated forest
CN117294017A (en) * 2023-10-07 2023-12-26 南方电网调峰调频(广东)储能科技有限公司 Multi-parameter comprehensive analysis energy storage power station state monitoring method and system
CN117407822A (en) * 2023-12-12 2024-01-16 江苏新希望生态科技有限公司 Full-automatic bud seedling machine and control method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108732494A (en) * 2017-04-21 2018-11-02 上海电气集团股份有限公司 A kind of wind-driven generator abnormity diagnosis processing system
CN110430260A (en) * 2019-08-02 2019-11-08 哈工大机器人(合肥)国际创新研究院 Robot cloud platform based on big data cloud computing support and working method
CN111798312A (en) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 Financial transaction system abnormity identification method based on isolated forest algorithm
CN112362292A (en) * 2020-10-30 2021-02-12 北京交通大学 Method for anomaly detection of wind tunnel test data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108732494A (en) * 2017-04-21 2018-11-02 上海电气集团股份有限公司 A kind of wind-driven generator abnormity diagnosis processing system
CN110430260A (en) * 2019-08-02 2019-11-08 哈工大机器人(合肥)国际创新研究院 Robot cloud platform based on big data cloud computing support and working method
CN111798312A (en) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 Financial transaction system abnormity identification method based on isolated forest algorithm
CN112362292A (en) * 2020-10-30 2021-02-12 北京交通大学 Method for anomaly detection of wind tunnel test data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964216A (en) * 2023-01-30 2023-04-14 北京慧图科技(集团)股份有限公司 Internet of things equipment data anomaly detection method based on isolated forest
CN117294017A (en) * 2023-10-07 2023-12-26 南方电网调峰调频(广东)储能科技有限公司 Multi-parameter comprehensive analysis energy storage power station state monitoring method and system
CN117407822A (en) * 2023-12-12 2024-01-16 江苏新希望生态科技有限公司 Full-automatic bud seedling machine and control method thereof
CN117407822B (en) * 2023-12-12 2024-02-20 江苏新希望生态科技有限公司 Full-automatic bud seedling machine and control method thereof

Similar Documents

Publication Publication Date Title
CN113284004A (en) Power data diagnosis treatment method based on isolated forest algorithm
CN107179503B (en) Wind turbine generator fault intelligent diagnosis and early warning method based on random forest
CN106888205B (en) Non-invasive PLC anomaly detection method based on power consumption analysis
CN108011367B (en) Power load characteristic mining method based on depth decision tree algorithm
CN109753499A (en) A kind of O&M monitoring data administering method
CN109740648A (en) Electric load disorder data recognition method, apparatus and computer equipment
CN112363890A (en) Data center operation and maintenance system threshold value self-adaptive alarm monitoring method based on Prophet model
CN111080982A (en) Dam safety intelligent monitoring and early warning system and method based on multiple sensors
CN111767951A (en) Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
CN110580492A (en) Track circuit fault precursor discovery method based on small fluctuation detection
CN113554055A (en) Processing condition identification method based on clustering algorithm
CN113570200A (en) Power grid operation state monitoring method and system based on multidimensional information
CN111666978B (en) Intelligent fault early warning system for IT system operation and maintenance big data
CN114371353A (en) Power equipment abnormity monitoring method and system based on voiceprint recognition
CN111157850A (en) Mean value clustering-based power grid line fault identification method
CN116359285A (en) Oil gas concentration intelligent detection system and method based on big data
CN112200000A (en) Welding stability recognition model training method and welding stability recognition method
CN113721000B (en) Method and system for detecting abnormity of dissolved gas in transformer oil
CN116502043A (en) Finish rolling motor state analysis method based on isolated forest algorithm
CN110826735B (en) Electric SCADA intelligent multidimensional query and overhaul method
CN115186935B (en) Electromechanical device nonlinear fault prediction method and system
CN117411703A (en) Modbus protocol-oriented industrial control network abnormal flow detection method
CN110543675A (en) Power transmission line fault identification method
CN116030955A (en) Medical equipment state monitoring method and related device based on Internet of things
CN115766227A (en) Flow abnormity detection method based on single support vector machine OCSVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820