CN108776683B - Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network - Google Patents
Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network Download PDFInfo
- Publication number
- CN108776683B CN108776683B CN201810559071.3A CN201810559071A CN108776683B CN 108776683 B CN108776683 B CN 108776683B CN 201810559071 A CN201810559071 A CN 201810559071A CN 108776683 B CN108776683 B CN 108776683B
- Authority
- CN
- China
- Prior art keywords
- data
- neural network
- isolated forest
- value
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 33
- 238000012423 maintenance Methods 0.000 title claims abstract description 21
- 238000004140 cleaning Methods 0.000 title claims abstract description 18
- 230000002159 abnormal effect Effects 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000004891 communication Methods 0.000 claims abstract description 8
- 238000011156 evaluation Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 23
- 230000008859 change Effects 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 230000005856 abnormality Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000012937 correction Methods 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000009795 derivation Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000013016 damping Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000010355 oscillation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Water Supply & Treatment (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a method for cleaning power communication operation and maintenance data, and more particularly relates to a method for cleaning power communication operation and maintenance data based on an isolated forest algorithm and a neural network, which comprises the following steps: firstly, constructing an isolated forest model iForest for solving a target problem by utilizing an improved isolated forest algorithm; then defining an evaluation system of the isolated forest algorithm to the abnormal data; and predicting and correcting the abnormal data attribute detected by the isolated forest by training a BP neural network. The method is optimized for the electric power communication operation and maintenance data cleaning method based on the isolated forest algorithm and the neural network, improves the abnormal detection accuracy and reduces the data correction error, and effectively optimizes the electric power operation and maintenance data cleaning program in the aspects of abnormal data positioning accuracy, data correction accuracy, training time, resource occupation and the like.
Description
Technical Field
The invention provides a method for cleaning power communication operation and maintenance data, and particularly relates to a method for cleaning power communication operation and maintenance data based on an isolated forest algorithm and a neural network.
Background
With the rapid development of power communication networks, the quantity of power operation and maintenance data is larger and larger, and the requirement of a power department on data reliability is higher and higher. In the process of transmitting and storing the electric power operation and maintenance data, the problems of bad data such as noise, data loss, data error and the like can not be avoided under the influence of external interference, transmission error and the like; the power data contains multidimensional attributes and is acquired by different devices respectively, and challenges are provided for abnormal detection of the data. The traditional data correction methods such as mean value calculation, regression analysis and the like cannot accurately learn the characteristics and rules of the whole data set, and particularly under the condition of high data dimensionality, the data correction error is large. At present, data cleaning mainly comprises consistency inspection, error value, missing value, invalid value processing and other mechanisms, and an artificial neural network algorithm can be adopted to improve data quality. Patent 201610370415.7 discloses a data cleansing method for RFID data, which filters data with coding errors by a hardware EPC (Electronic product code) filter, thereby achieving cleansing of the repeated data. However, the method does not correct missing values and invalid values, and is not suitable for processing large-scale electric power operation and maintenance data with complex attributes due to limited hardware processing capacity; the patent 201510129479.3 performs data cleaning based on an ETL mechanism in a data warehouse, and has the advantages of large cleaning range and high algorithm execution efficiency. However, the power operation and maintenance data contain multidimensional attributes, and the data volume and scale are large and the attributes are complex, so that the scheme still has defects in the aspects of cleaning precision, data quality and the like. The efficient data cleaning method is selected to provide important support for analysis and mining of the power operation and maintenance data, and the method has important significance for improving the comprehensive benefits of the power operation and maintenance.
Disclosure of Invention
The invention provides the electric power operation and maintenance data cleaning method based on the isolated forest algorithm and the neural network for overcoming at least one defect in the prior art, improves the branching step of the isolated forest algorithm, improves the efficiency and the accuracy of the isolated forest model, enables the learning rate to be adaptively adjusted along with the change trend of the network, and improves the performance of the BP neural network. The method is effectively optimized in the aspects of abnormal data positioning accuracy, data correction accuracy, training time, resource occupation and the like.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a power operation and maintenance data cleaning method based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
and S3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest.
Preferably, the step S1 specifically includes the following steps:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing new child nodes until only one data item in the child nodes (unable to continue cutting) or the iTree has reached the initially defined limit height.
Preferably, the step S2 specifically includes:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
Preferably, the step S3 specifically includes:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
wherein W represents a weight parameter in the BP neural network,representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,represents the bias of the ith unit of the l +1 th layer; f represents an activation function, an ELU (explicit Linear units) function is adopted here, the advantages are that the calculation is simple and convenient, the problem of gradient disappearance caused by the subsequent error gradient calculation can be prevented, mu is an amplitude parameter of the ELU function, the adjustment can be flexibly carried out in the actual operation, generally (0,1),the activation value of the ith unit of the ith layer is calculated layer by layer in such a way until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
wherein,representing cost function versus weight parameterThe error gradient of (2) is deduced from the output layer by layer forwards through a chain type derivation method,the derivation relation is given by the formula (4), which is not repeated, and the method for calculating the error gradient of the threshold parameter is the same;
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
wherein alpha isk+1Representing the learning rate of the neural network at the time k +1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,andrespectively representing gradient values calculated at k moment and k-1 moment, and introducing a momentum factor eta with the value of (0,1) as a damping term of gradient change to reduce the gradient change difference caused by two adjacent momentsThe self-adaptive change of the learning rate is safer and more stable due to the large oscillation;
s37, the weight parameter and the threshold parameter are updated according to the gradient descent algorithms of equations (11) and (12), and α represents the current learning rate, and then returns to S31.
Compared with the prior art, the beneficial effects are:
(1) in the abnormal data detection stage, considering the correlation among the attributes of the power metadata, the algorithm firstly improves the construction mode of an isolated Tree (Isolation Tree) in an isolated forest model, so that the correlation of the attributes is more sensitive, the branching step of the isolated forest algorithm is improved, and the efficiency and the accuracy of the isolated forest model are improved.
(2) In the stage of predicting and correcting data, the algorithm automatically adjusts the learning rate according to the trend of gradient change, so that the learning rate is continuously adjusted to the most appropriate value, the stability of the gradient change is ensured, the convergence speed is greatly increased, the network overhead is reduced, the problem that the convergence is too slow in the later stage of the traditional BP neural network algorithm training is solved, and the convergence curve of the network is more stable.
The method constructs an isolated forest to extract the characteristics of a training data set, detects abnormal data in the data set, and then uses an improved BP neural network model to predict and modify the abnormal data. The electric power operation and maintenance data cleaning program based on the improved scheme is effectively optimized in the aspects of abnormal data positioning accuracy, data correction accuracy, training time and the like.
Drawings
FIG. 1 is a flow chart of a method for cleaning power operation and maintenance data based on an isolated forest algorithm and a neural network.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.
As shown in fig. 1, a method for cleaning power operation and maintenance data based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
and S3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest.
The step S1 specifically includes the following steps:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing new child nodes until only one data item in the child nodes (unable to continue cutting) or the iTree has reached the initially defined limit height.
And substituting the iTree branch method designed according to the algorithm into data for training until all iTrees in iForest are constructed.
The step S2 specifically includes:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
In the step S3, training the training set until the overall cost function of the neural network converges, taking attributes (T, AP, RH, V) as input vectors and electric energy output EP as output values; the method specifically comprises the following steps:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
wherein W represents BP nerveThe weight parameters in the network are used to determine,representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,represents the bias of the ith unit of the l +1 th layer; f represents an activation function, an ELU (explicit Linear units) function is adopted here, the advantages are that the calculation is simple and convenient, the problem of gradient disappearance caused by the subsequent error gradient calculation can be prevented, mu is an amplitude parameter of the ELU function, the adjustment can be flexibly carried out in the actual operation, generally (0,1),the activation value of the ith unit of the ith layer is calculated layer by layer in such a way until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
wherein,representing cost function versus weight parameterThe error gradient of (2) is deduced from the output layer by layer forwards through a chain type derivation method,the derivation relation is given by the formula (4), which is not repeated, and the method for calculating the error gradient of the threshold parameter is the same;
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
wherein alpha isk+1Representing the learning rate of the neural network at the time k + 1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,andthe gradient values calculated at the k moment and the k-1 moment are respectively represented, and in addition, a momentum factor eta is introduced, the value of the momentum factor eta is (0,1), and the momentum factor eta is used as a damping term of gradient change and is used for reducing oscillation caused by overlarge gradient change difference between two adjacent moments, so that the self-adaptive change of the learning rate is safer and more stable;
s37, the weight parameter and the threshold parameter are updated according to the gradient descent algorithms of equations (11) and (12), and α represents the current learning rate, and then returns to S31.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (2)
1. A power communication operation and maintenance data cleaning method based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
the specific steps of S1 include the following:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing a new child node until the child node only has one data item, namely the cutting cannot be continued; or the iTree has reached an initially defined height;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
s3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest;
the S3 specifically includes:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
wherein W represents a weight parameter in the BP neural network,representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,represents the bias of the ith unit of the l +1 th layer; f represents an activation function, the value range of mu is (0,1),the activation value of the ith unit of the ith layer is represented, and the activation value is calculated layer by layer until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
wherein alpha isk+1Representing the learning rate of the neural network at the time k +1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,andrespectively representing gradient values calculated at the k moment and the k-1 moment, and introducing a momentum factor eta, wherein the value is (0, 1);
s37, updating the weight parameter and the threshold parameter according to the gradient descent algorithm of the equation (11) and the equation (12), where alpha represents the current learning rate, and then returning to S31,
2. the method for cleaning operation and maintenance data of power communication based on an isolated forest algorithm and a neural network as claimed in claim 1, wherein the S2 specifically comprises:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810559071.3A CN108776683B (en) | 2018-06-01 | 2018-06-01 | Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810559071.3A CN108776683B (en) | 2018-06-01 | 2018-06-01 | Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108776683A CN108776683A (en) | 2018-11-09 |
CN108776683B true CN108776683B (en) | 2022-01-21 |
Family
ID=64026612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810559071.3A Active CN108776683B (en) | 2018-06-01 | 2018-06-01 | Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108776683B (en) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109506963B (en) * | 2018-11-29 | 2019-09-03 | 中南大学 | A kind of intelligence train traction failure big data abnormality detection discrimination method |
CN109684311A (en) * | 2018-12-06 | 2019-04-26 | 中科恒运股份有限公司 | Abnormal deviation data examination method and device |
CN109714343B (en) * | 2018-12-28 | 2022-02-22 | 北京天融信网络安全技术有限公司 | Method and device for judging network traffic abnormity |
CN109902721B (en) * | 2019-01-28 | 2024-07-02 | 平安科技(深圳)有限公司 | Abnormal point detection model verification method, device, computer equipment and storage medium |
CN109934489B (en) * | 2019-03-12 | 2021-03-02 | 广东电网有限责任公司 | Power equipment state evaluation method |
CN110135614A (en) * | 2019-03-26 | 2019-08-16 | 广东工业大学 | It is a kind of to be tripped prediction technique based on rejecting outliers and the 10kV distribution low-voltage of sampling techniques |
CN110334085A (en) * | 2019-05-30 | 2019-10-15 | 广州供电局有限公司 | Power distribution network data monitoring and modification method, device, computer and storage medium |
CN110209658B (en) * | 2019-06-04 | 2021-09-14 | 北京字节跳动网络技术有限公司 | Data cleaning method and device |
CN110288362A (en) * | 2019-07-03 | 2019-09-27 | 北京工业大学 | Brush single prediction technique, device and electronic equipment |
CN110399935A (en) * | 2019-08-02 | 2019-11-01 | 哈工大机器人(合肥)国际创新研究院 | The real-time method for monitoring abnormality of robot and system based on isolated forest machine learning |
CN110619182A (en) * | 2019-09-24 | 2019-12-27 | 长沙理工大学 | Power transmission line parameter identification and power transmission network modeling method based on WAMS big data |
CN110705873B (en) * | 2019-09-30 | 2022-06-03 | 国网福建省电力有限公司 | Power distribution network running state portrait analysis method |
CN110929751B (en) * | 2019-10-16 | 2022-11-22 | 福建和盛高科技产业有限公司 | Current transformer unbalance warning method based on multi-source data fusion |
CN110750527A (en) * | 2019-10-24 | 2020-02-04 | 南方电网科学研究院有限责任公司 | Data cleaning method for electric power big data |
CN111008662B (en) * | 2019-12-04 | 2023-01-10 | 贵州电网有限责任公司 | Online monitoring data anomaly analysis method for power transmission line |
CN111030855B (en) * | 2019-12-05 | 2022-05-17 | 国网山西省电力公司信息通信分公司 | Intelligent baseline determination and alarm method for ubiquitous power Internet of things system data |
CN111081016B (en) * | 2019-12-18 | 2021-07-06 | 北京航空航天大学 | Urban traffic abnormity identification method based on complex network theory |
CN113011552B (en) * | 2019-12-20 | 2023-07-18 | 中移(成都)信息通信科技有限公司 | Neural network training method, device, equipment and medium |
CN111160647B (en) * | 2019-12-30 | 2023-08-22 | 第四范式(北京)技术有限公司 | Money laundering behavior prediction method and device |
CN111145175B (en) * | 2020-01-10 | 2020-10-16 | 惠州光弘科技股份有限公司 | SMT welding spot defect detection method based on iForest model verification |
CN111340063B (en) * | 2020-02-10 | 2023-08-29 | 国能信控互联技术有限公司 | Data anomaly detection method for coal mill |
CN111505433B (en) * | 2020-04-10 | 2022-06-28 | 国网浙江余姚市供电有限公司 | Low-voltage transformer area indoor variable relation error correction and phase identification method |
CN111666276A (en) * | 2020-06-11 | 2020-09-15 | 上海积成能源科技有限公司 | Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction |
CN111950853B (en) * | 2020-07-14 | 2024-05-31 | 东南大学 | Electric power running state white list generation method based on information physical bilateral data |
CN112016249A (en) * | 2020-08-31 | 2020-12-01 | 华北电力大学 | SCR denitration system bad data identification method based on optimized BP neural network |
CN112686838B (en) * | 2020-11-30 | 2024-03-29 | 江苏科技大学 | Rapid detection device and detection method for ship anchor chain flash welding system |
CN112884237A (en) * | 2021-03-11 | 2021-06-01 | 山东科技大学 | Power distribution network prediction auxiliary state estimation method and system |
CN113239999A (en) * | 2021-05-07 | 2021-08-10 | 北京沃东天骏信息技术有限公司 | Data anomaly detection method and device and electronic equipment |
CN113627541B (en) * | 2021-08-13 | 2023-07-21 | 北京邮电大学 | Optical path transmission quality prediction method based on sample migration screening |
CN114459574B (en) * | 2022-02-10 | 2023-09-26 | 电子科技大学 | Automatic evaluation method and device for high-speed fluid flow measurement accuracy and storage medium |
CN115021679B (en) * | 2022-08-09 | 2022-11-04 | 国网山西省电力公司大同供电公司 | Photovoltaic equipment fault detection method based on multi-dimensional outlier detection |
CN115760484B (en) * | 2022-12-07 | 2024-09-06 | 湖北华中电力科技开发有限责任公司 | Method, device and system for improving hidden danger identification capability of power distribution area and storage medium |
CN116501444B (en) * | 2023-04-28 | 2024-02-27 | 重庆大学 | Abnormal cloud edge collaborative monitoring and recovering system and method for virtual machine of intelligent network-connected automobile domain controller |
CN117994632A (en) * | 2024-02-21 | 2024-05-07 | 中国地质科学院矿产资源研究所 | Mineral resource remote area delineating method, equipment, medium and product |
CN117786587B (en) * | 2024-02-28 | 2024-06-04 | 国网河南省电力公司经济技术研究院 | Power grid data quality abnormality diagnosis method based on data analysis |
CN117874653B (en) * | 2024-03-11 | 2024-05-31 | 武汉佳华创新电气有限公司 | Power system safety monitoring method and system based on multi-source data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050160340A1 (en) * | 2004-01-02 | 2005-07-21 | Naoki Abe | Resource-light method and apparatus for outlier detection |
CN107196953A (en) * | 2017-06-14 | 2017-09-22 | 上海丁牛信息科技有限公司 | A kind of anomaly detection method based on user behavior analysis |
CN107945046A (en) * | 2016-10-12 | 2018-04-20 | 中国电力科学研究院 | A kind of new energy power station output data recovery method and device |
-
2018
- 2018-06-01 CN CN201810559071.3A patent/CN108776683B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050160340A1 (en) * | 2004-01-02 | 2005-07-21 | Naoki Abe | Resource-light method and apparatus for outlier detection |
CN107945046A (en) * | 2016-10-12 | 2018-04-20 | 中国电力科学研究院 | A kind of new energy power station output data recovery method and device |
CN107196953A (en) * | 2017-06-14 | 2017-09-22 | 上海丁牛信息科技有限公司 | A kind of anomaly detection method based on user behavior analysis |
Non-Patent Citations (2)
Title |
---|
基于数据挖掘的用电数据异常的分析与研究;张荣昌;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115;第14-62页 * |
基于神经网络的电力负荷不良数据的清洗;顾民;《微计算机信息》;20071231;第23卷(第7-3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108776683A (en) | 2018-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108776683B (en) | Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network | |
US11409347B2 (en) | Method, system and storage medium for predicting power load probability density based on deep learning | |
Kim et al. | Length-adaptive transformer: Train once with length drop, use anytime with search | |
CN113052334A (en) | Method and system for realizing federated learning, terminal equipment and readable storage medium | |
CN103105246A (en) | Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm | |
CN107977710A (en) | Electricity consumption abnormal data detection method and device | |
CN111553469A (en) | Wireless sensor network data fusion method, device and storage medium | |
CN111461463A (en) | Short-term load prediction method, system and equipment based on TCN-BP | |
US20240095535A1 (en) | Executing a genetic algorithm on a low-power controller | |
CN116681104B (en) | Model building and realizing method of distributed space diagram neural network | |
CN112149883A (en) | Photovoltaic power prediction method based on FWA-BP neural network | |
CN109858798B (en) | Power grid investment decision modeling method and device for correlating transformation measures with voltage indexes | |
CN111625399A (en) | Method and system for recovering metering data | |
CN111832825A (en) | Wind power prediction method and system integrating long-term and short-term memory network and extreme learning machine | |
CN103957582A (en) | Wireless sensor network self-adaptation compression method | |
CN112651519A (en) | Secondary equipment fault positioning method and system based on deep learning theory | |
CN114781875B (en) | Micro-grid economic operation state evaluation method based on deep convolution network | |
CN113139570A (en) | Dam safety monitoring data completion method based on optimal hybrid valuation | |
CN108921287A (en) | A kind of optimization method and system of neural network model | |
CN116992779A (en) | Simulation method and system of photovoltaic energy storage system based on digital twin model | |
CN111932091A (en) | Survival analysis risk function prediction method based on gradient survival lifting tree | |
CN111400964B (en) | Fault occurrence time prediction method and device | |
CN116680969A (en) | Filler evaluation parameter prediction method and device for PSO-BP algorithm | |
CN107273971A (en) | Architecture of Feed-forward Neural Network self-organizing method based on neuron conspicuousness | |
CN116896093A (en) | Online analysis and optimization method for grid-connected oscillation stability of wind farm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |