CN108776683B - Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network - Google Patents

Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network Download PDF

Info

Publication number
CN108776683B
CN108776683B CN201810559071.3A CN201810559071A CN108776683B CN 108776683 B CN108776683 B CN 108776683B CN 201810559071 A CN201810559071 A CN 201810559071A CN 108776683 B CN108776683 B CN 108776683B
Authority
CN
China
Prior art keywords
data
neural network
isolated forest
value
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810559071.3A
Other languages
Chinese (zh)
Other versions
CN108776683A (en
Inventor
李星南
曾瑛
蔡毅
李伟坚
施展
亢中苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN201810559071.3A priority Critical patent/CN108776683B/en
Publication of CN108776683A publication Critical patent/CN108776683A/en
Application granted granted Critical
Publication of CN108776683B publication Critical patent/CN108776683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Water Supply & Treatment (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for cleaning power communication operation and maintenance data, and more particularly relates to a method for cleaning power communication operation and maintenance data based on an isolated forest algorithm and a neural network, which comprises the following steps: firstly, constructing an isolated forest model iForest for solving a target problem by utilizing an improved isolated forest algorithm; then defining an evaluation system of the isolated forest algorithm to the abnormal data; and predicting and correcting the abnormal data attribute detected by the isolated forest by training a BP neural network. The method is optimized for the electric power communication operation and maintenance data cleaning method based on the isolated forest algorithm and the neural network, improves the abnormal detection accuracy and reduces the data correction error, and effectively optimizes the electric power operation and maintenance data cleaning program in the aspects of abnormal data positioning accuracy, data correction accuracy, training time, resource occupation and the like.

Description

Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network
Technical Field
The invention provides a method for cleaning power communication operation and maintenance data, and particularly relates to a method for cleaning power communication operation and maintenance data based on an isolated forest algorithm and a neural network.
Background
With the rapid development of power communication networks, the quantity of power operation and maintenance data is larger and larger, and the requirement of a power department on data reliability is higher and higher. In the process of transmitting and storing the electric power operation and maintenance data, the problems of bad data such as noise, data loss, data error and the like can not be avoided under the influence of external interference, transmission error and the like; the power data contains multidimensional attributes and is acquired by different devices respectively, and challenges are provided for abnormal detection of the data. The traditional data correction methods such as mean value calculation, regression analysis and the like cannot accurately learn the characteristics and rules of the whole data set, and particularly under the condition of high data dimensionality, the data correction error is large. At present, data cleaning mainly comprises consistency inspection, error value, missing value, invalid value processing and other mechanisms, and an artificial neural network algorithm can be adopted to improve data quality. Patent 201610370415.7 discloses a data cleansing method for RFID data, which filters data with coding errors by a hardware EPC (Electronic product code) filter, thereby achieving cleansing of the repeated data. However, the method does not correct missing values and invalid values, and is not suitable for processing large-scale electric power operation and maintenance data with complex attributes due to limited hardware processing capacity; the patent 201510129479.3 performs data cleaning based on an ETL mechanism in a data warehouse, and has the advantages of large cleaning range and high algorithm execution efficiency. However, the power operation and maintenance data contain multidimensional attributes, and the data volume and scale are large and the attributes are complex, so that the scheme still has defects in the aspects of cleaning precision, data quality and the like. The efficient data cleaning method is selected to provide important support for analysis and mining of the power operation and maintenance data, and the method has important significance for improving the comprehensive benefits of the power operation and maintenance.
Disclosure of Invention
The invention provides the electric power operation and maintenance data cleaning method based on the isolated forest algorithm and the neural network for overcoming at least one defect in the prior art, improves the branching step of the isolated forest algorithm, improves the efficiency and the accuracy of the isolated forest model, enables the learning rate to be adaptively adjusted along with the change trend of the network, and improves the performance of the BP neural network. The method is effectively optimized in the aspects of abnormal data positioning accuracy, data correction accuracy, training time, resource occupation and the like.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a power operation and maintenance data cleaning method based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
and S3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest.
Preferably, the step S1 specifically includes the following steps:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing new child nodes until only one data item in the child nodes (unable to continue cutting) or the iTree has reached the initially defined limit height.
Preferably, the step S2 specifically includes:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
Figure BDA0001682714690000031
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
Preferably, the step S3 specifically includes:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
Figure BDA0001682714690000032
Figure BDA0001682714690000033
wherein W represents a weight parameter in the BP neural network,
Figure BDA0001682714690000034
representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,
Figure BDA0001682714690000035
represents the bias of the ith unit of the l +1 th layer; f represents an activation function, an ELU (explicit Linear units) function is adopted here, the advantages are that the calculation is simple and convenient, the problem of gradient disappearance caused by the subsequent error gradient calculation can be prevented, mu is an amplitude parameter of the ELU function, the adjustment can be flexibly carried out in the actual operation, generally (0,1),
Figure BDA0001682714690000036
the activation value of the ith unit of the ith layer is calculated layer by layer in such a way until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
Figure BDA0001682714690000037
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
Figure BDA0001682714690000038
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
Figure BDA0001682714690000041
wherein,
Figure BDA0001682714690000042
representing cost function versus weight parameter
Figure BDA0001682714690000043
The error gradient of (2) is deduced from the output layer by layer forwards through a chain type derivation method,
Figure BDA0001682714690000044
the derivation relation is given by the formula (4), which is not repeated, and the method for calculating the error gradient of the threshold parameter is the same;
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
Figure BDA0001682714690000045
Figure BDA0001682714690000046
wherein alpha isk+1Representing the learning rate of the neural network at the time k +1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,
Figure BDA0001682714690000047
and
Figure BDA0001682714690000048
respectively representing gradient values calculated at k moment and k-1 moment, and introducing a momentum factor eta with the value of (0,1) as a damping term of gradient change to reduce the gradient change difference caused by two adjacent momentsThe self-adaptive change of the learning rate is safer and more stable due to the large oscillation;
s37, the weight parameter and the threshold parameter are updated according to the gradient descent algorithms of equations (11) and (12), and α represents the current learning rate, and then returns to S31.
Figure BDA0001682714690000049
Compared with the prior art, the beneficial effects are:
(1) in the abnormal data detection stage, considering the correlation among the attributes of the power metadata, the algorithm firstly improves the construction mode of an isolated Tree (Isolation Tree) in an isolated forest model, so that the correlation of the attributes is more sensitive, the branching step of the isolated forest algorithm is improved, and the efficiency and the accuracy of the isolated forest model are improved.
(2) In the stage of predicting and correcting data, the algorithm automatically adjusts the learning rate according to the trend of gradient change, so that the learning rate is continuously adjusted to the most appropriate value, the stability of the gradient change is ensured, the convergence speed is greatly increased, the network overhead is reduced, the problem that the convergence is too slow in the later stage of the traditional BP neural network algorithm training is solved, and the convergence curve of the network is more stable.
The method constructs an isolated forest to extract the characteristics of a training data set, detects abnormal data in the data set, and then uses an improved BP neural network model to predict and modify the abnormal data. The electric power operation and maintenance data cleaning program based on the improved scheme is effectively optimized in the aspects of abnormal data positioning accuracy, data correction accuracy, training time and the like.
Drawings
FIG. 1 is a flow chart of a method for cleaning power operation and maintenance data based on an isolated forest algorithm and a neural network.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.
As shown in fig. 1, a method for cleaning power operation and maintenance data based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
and S3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest.
The step S1 specifically includes the following steps:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing new child nodes until only one data item in the child nodes (unable to continue cutting) or the iTree has reached the initially defined limit height.
And substituting the iTree branch method designed according to the algorithm into data for training until all iTrees in iForest are constructed.
The step S2 specifically includes:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
Figure BDA0001682714690000061
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
In the step S3, training the training set until the overall cost function of the neural network converges, taking attributes (T, AP, RH, V) as input vectors and electric energy output EP as output values; the method specifically comprises the following steps:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
Figure BDA0001682714690000062
Figure BDA0001682714690000063
wherein W represents BP nerveThe weight parameters in the network are used to determine,
Figure BDA0001682714690000064
representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,
Figure BDA0001682714690000065
represents the bias of the ith unit of the l +1 th layer; f represents an activation function, an ELU (explicit Linear units) function is adopted here, the advantages are that the calculation is simple and convenient, the problem of gradient disappearance caused by the subsequent error gradient calculation can be prevented, mu is an amplitude parameter of the ELU function, the adjustment can be flexibly carried out in the actual operation, generally (0,1),
Figure BDA0001682714690000071
the activation value of the ith unit of the ith layer is calculated layer by layer in such a way until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
Figure BDA0001682714690000072
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
Figure BDA0001682714690000073
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
Figure BDA0001682714690000074
wherein,
Figure BDA0001682714690000075
representing cost function versus weight parameter
Figure BDA0001682714690000076
The error gradient of (2) is deduced from the output layer by layer forwards through a chain type derivation method,
Figure BDA0001682714690000077
the derivation relation is given by the formula (4), which is not repeated, and the method for calculating the error gradient of the threshold parameter is the same;
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
Figure BDA0001682714690000078
Figure BDA0001682714690000079
wherein alpha isk+1Representing the learning rate of the neural network at the time k +1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,
Figure BDA00016827146900000710
and
Figure BDA00016827146900000711
the gradient values calculated at the k moment and the k-1 moment are respectively represented, and in addition, a momentum factor eta is introduced, the value of the momentum factor eta is (0,1), and the momentum factor eta is used as a damping term of gradient change and is used for reducing oscillation caused by overlarge gradient change difference between two adjacent moments, so that the self-adaptive change of the learning rate is safer and more stable;
s37, the weight parameter and the threshold parameter are updated according to the gradient descent algorithms of equations (11) and (12), and α represents the current learning rate, and then returns to S31.
Figure BDA0001682714690000081
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (2)

1. A power communication operation and maintenance data cleaning method based on an isolated forest algorithm and a neural network is characterized by comprising the following steps:
s1, constructing an isolated forest model iForest for solving a target problem by using an improved isolated forest algorithm;
the specific steps of S1 include the following:
s11, at the beginning of the method, firstly, grouping the attributes;
s12, randomly selecting psi sample data points from the training data set as a sub-sampling set, constructing an initial iTree, and putting the sub-sampling set into a root node of the tree; psi is the number of randomly selected sample data points;
s13, randomly appointing an attribute group of the data item, and selecting a division cutting point in the current node data;
s14, generating a hyperplane by the cutting point, dividing the data space of the current node into two subspaces, and dividing the data items;
s15, recursively constructing a new child node until the child node only has one data item, namely the cutting cannot be continued; or the iTree has reached an initially defined height;
s2, defining an evaluation system of the isolated forest algorithm to the abnormal data;
s3, training a learning rate self-adaptive BP neural network to predict and correct the abnormal data attribute detected by the isolated forest;
the S3 specifically includes:
s31, randomly selecting a small batch of data samples in the data set, namely a combination of the input vector and the output expected value, and substituting the combination into the neural network;
s32, carrying out forward propagation process layer by layer, and calculating the activation value of each layer of the neural network according to the formula (4) and the formula (5):
Figure FDA0003332415670000011
Figure FDA0003332415670000012
wherein W represents a weight parameter in the BP neural network,
Figure FDA0003332415670000013
representing a weight parameter between the jth unit of the ith layer and the ith unit of the (l + 1) th layer; b: a threshold parameter in the BP neural network,
Figure FDA0003332415670000014
represents the bias of the ith unit of the l +1 th layer; f represents an activation function, the value range of mu is (0,1),
Figure FDA0003332415670000015
the activation value of the ith unit of the ith layer is represented, and the activation value is calculated layer by layer until the output value h of the neural network is obtainedW,b(x);
S33, calculating the error between the expected value and the actual output according to the formula (6):
Figure FDA0003332415670000021
wherein h isW,b(x) Representing an output value obtained by the neural network through forward propagation, y representing an expected value, W and b representing a weight matrix and a threshold matrix respectively, and J representing an error;
s34, calculating the whole cost function according to the formula (7), ending if the function converges to the global minimum value, otherwise, turning to S35;
Figure FDA0003332415670000022
wherein, L represents the overall cost function of the neural network, and m represents the number of samples;
s35, performing a back propagation process, wherein the back propagation process is to adjust parameters of each layer of the neural network through a gradient descent algorithm, continuously reduce the cost function, firstly calculate the error of each neuron, and calculate the error gradient according to the formula (8):
Figure FDA0003332415670000023
wherein,
Figure FDA0003332415670000024
representing cost function versus weight parameter
Figure FDA0003332415670000025
The error gradient of (a);
s36, judging the gradient change trend, adaptively adjusting the learning rate of the neural network, if the two adjacent gradients are adjusted to be in the same direction, increasing the learning rate according to the formula (9), and if the two adjacent gradients are adjusted to be in opposite directions, indicating that the gradient change fluctuation is large, and decreasing the learning rate according to the formula (10):
Figure FDA0003332415670000026
Figure FDA0003332415670000027
wherein alpha isk+1Representing the learning rate of the neural network at the time k +1, for controlling the rate of gradient change, alpha, during the back propagation of the neural networkkRepresents the learning rate of the neural network at time k,
Figure FDA0003332415670000028
and
Figure FDA0003332415670000029
respectively representing gradient values calculated at the k moment and the k-1 moment, and introducing a momentum factor eta, wherein the value is (0, 1);
s37, updating the weight parameter and the threshold parameter according to the gradient descent algorithm of the equation (11) and the equation (12), where alpha represents the current learning rate, and then returning to S31,
Figure FDA00033324156700000210
2. the method for cleaning operation and maintenance data of power communication based on an isolated forest algorithm and a neural network as claimed in claim 1, wherein the S2 specifically comprises:
s21, selecting test data x, and substituting the test data x into each iTree in the forest; x represents test data;
s22, calculating the depth h (x) of each tree and calculating the average value E (h (x)) of all h (x); where h (x) represents the depth at which the test data point falls on each tree; e (h (x)) represents the average of all h (x);
s23, setting the standard average search length c (ψ) according to equation (1):
c (psi) ═ 2H (psi-1) - (2 (psi-1)/psi) formula (1)
Wherein H (i) is calculated according to equation (2):
h (i) ═ ln (i) + Ec formula (2)
Ec is an euler constant with a value of 0.5772; c (ψ) represents the standard average search length of iTree;
s24, defining the abnormal score S (x, psi) of the data to be measured according to the formula (3):
Figure FDA0003332415670000031
s (x, ψ) represents the abnormality score of the data to be measured, and the closer the abnormality score value is to 1, the greater the possibility that the data is abnormal data is indicated.
CN201810559071.3A 2018-06-01 2018-06-01 Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network Active CN108776683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810559071.3A CN108776683B (en) 2018-06-01 2018-06-01 Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810559071.3A CN108776683B (en) 2018-06-01 2018-06-01 Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network

Publications (2)

Publication Number Publication Date
CN108776683A CN108776683A (en) 2018-11-09
CN108776683B true CN108776683B (en) 2022-01-21

Family

ID=64026612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810559071.3A Active CN108776683B (en) 2018-06-01 2018-06-01 Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network

Country Status (1)

Country Link
CN (1) CN108776683B (en)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109506963B (en) * 2018-11-29 2019-09-03 中南大学 A kind of intelligence train traction failure big data abnormality detection discrimination method
CN109684311A (en) * 2018-12-06 2019-04-26 中科恒运股份有限公司 Abnormal deviation data examination method and device
CN109714343B (en) * 2018-12-28 2022-02-22 北京天融信网络安全技术有限公司 Method and device for judging network traffic abnormity
CN109902721B (en) * 2019-01-28 2024-07-02 平安科技(深圳)有限公司 Abnormal point detection model verification method, device, computer equipment and storage medium
CN109934489B (en) * 2019-03-12 2021-03-02 广东电网有限责任公司 Power equipment state evaluation method
CN110135614A (en) * 2019-03-26 2019-08-16 广东工业大学 It is a kind of to be tripped prediction technique based on rejecting outliers and the 10kV distribution low-voltage of sampling techniques
CN110334085A (en) * 2019-05-30 2019-10-15 广州供电局有限公司 Power distribution network data monitoring and modification method, device, computer and storage medium
CN110209658B (en) * 2019-06-04 2021-09-14 北京字节跳动网络技术有限公司 Data cleaning method and device
CN110288362A (en) * 2019-07-03 2019-09-27 北京工业大学 Brush single prediction technique, device and electronic equipment
CN110399935A (en) * 2019-08-02 2019-11-01 哈工大机器人(合肥)国际创新研究院 The real-time method for monitoring abnormality of robot and system based on isolated forest machine learning
CN110619182A (en) * 2019-09-24 2019-12-27 长沙理工大学 Power transmission line parameter identification and power transmission network modeling method based on WAMS big data
CN110705873B (en) * 2019-09-30 2022-06-03 国网福建省电力有限公司 Power distribution network running state portrait analysis method
CN110929751B (en) * 2019-10-16 2022-11-22 福建和盛高科技产业有限公司 Current transformer unbalance warning method based on multi-source data fusion
CN110750527A (en) * 2019-10-24 2020-02-04 南方电网科学研究院有限责任公司 Data cleaning method for electric power big data
CN111008662B (en) * 2019-12-04 2023-01-10 贵州电网有限责任公司 Online monitoring data anomaly analysis method for power transmission line
CN111030855B (en) * 2019-12-05 2022-05-17 国网山西省电力公司信息通信分公司 Intelligent baseline determination and alarm method for ubiquitous power Internet of things system data
CN111081016B (en) * 2019-12-18 2021-07-06 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN113011552B (en) * 2019-12-20 2023-07-18 中移(成都)信息通信科技有限公司 Neural network training method, device, equipment and medium
CN111160647B (en) * 2019-12-30 2023-08-22 第四范式(北京)技术有限公司 Money laundering behavior prediction method and device
CN111145175B (en) * 2020-01-10 2020-10-16 惠州光弘科技股份有限公司 SMT welding spot defect detection method based on iForest model verification
CN111340063B (en) * 2020-02-10 2023-08-29 国能信控互联技术有限公司 Data anomaly detection method for coal mill
CN111505433B (en) * 2020-04-10 2022-06-28 国网浙江余姚市供电有限公司 Low-voltage transformer area indoor variable relation error correction and phase identification method
CN111666276A (en) * 2020-06-11 2020-09-15 上海积成能源科技有限公司 Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction
CN111950853B (en) * 2020-07-14 2024-05-31 东南大学 Electric power running state white list generation method based on information physical bilateral data
CN112016249A (en) * 2020-08-31 2020-12-01 华北电力大学 SCR denitration system bad data identification method based on optimized BP neural network
CN112686838B (en) * 2020-11-30 2024-03-29 江苏科技大学 Rapid detection device and detection method for ship anchor chain flash welding system
CN112884237A (en) * 2021-03-11 2021-06-01 山东科技大学 Power distribution network prediction auxiliary state estimation method and system
CN113239999A (en) * 2021-05-07 2021-08-10 北京沃东天骏信息技术有限公司 Data anomaly detection method and device and electronic equipment
CN113627541B (en) * 2021-08-13 2023-07-21 北京邮电大学 Optical path transmission quality prediction method based on sample migration screening
CN114459574B (en) * 2022-02-10 2023-09-26 电子科技大学 Automatic evaluation method and device for high-speed fluid flow measurement accuracy and storage medium
CN115021679B (en) * 2022-08-09 2022-11-04 国网山西省电力公司大同供电公司 Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
CN115760484B (en) * 2022-12-07 2024-09-06 湖北华中电力科技开发有限责任公司 Method, device and system for improving hidden danger identification capability of power distribution area and storage medium
CN116501444B (en) * 2023-04-28 2024-02-27 重庆大学 Abnormal cloud edge collaborative monitoring and recovering system and method for virtual machine of intelligent network-connected automobile domain controller
CN117994632A (en) * 2024-02-21 2024-05-07 中国地质科学院矿产资源研究所 Mineral resource remote area delineating method, equipment, medium and product
CN117786587B (en) * 2024-02-28 2024-06-04 国网河南省电力公司经济技术研究院 Power grid data quality abnormality diagnosis method based on data analysis
CN117874653B (en) * 2024-03-11 2024-05-31 武汉佳华创新电气有限公司 Power system safety monitoring method and system based on multi-source data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
CN107196953A (en) * 2017-06-14 2017-09-22 上海丁牛信息科技有限公司 A kind of anomaly detection method based on user behavior analysis
CN107945046A (en) * 2016-10-12 2018-04-20 中国电力科学研究院 A kind of new energy power station output data recovery method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
CN107945046A (en) * 2016-10-12 2018-04-20 中国电力科学研究院 A kind of new energy power station output data recovery method and device
CN107196953A (en) * 2017-06-14 2017-09-22 上海丁牛信息科技有限公司 A kind of anomaly detection method based on user behavior analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于数据挖掘的用电数据异常的分析与研究;张荣昌;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115;第14-62页 *
基于神经网络的电力负荷不良数据的清洗;顾民;《微计算机信息》;20071231;第23卷(第7-3期);全文 *

Also Published As

Publication number Publication date
CN108776683A (en) 2018-11-09

Similar Documents

Publication Publication Date Title
CN108776683B (en) Electric power operation and maintenance data cleaning method based on isolated forest algorithm and neural network
US11409347B2 (en) Method, system and storage medium for predicting power load probability density based on deep learning
Kim et al. Length-adaptive transformer: Train once with length drop, use anytime with search
CN113052334A (en) Method and system for realizing federated learning, terminal equipment and readable storage medium
CN103105246A (en) Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm
CN107977710A (en) Electricity consumption abnormal data detection method and device
CN111553469A (en) Wireless sensor network data fusion method, device and storage medium
CN111461463A (en) Short-term load prediction method, system and equipment based on TCN-BP
US20240095535A1 (en) Executing a genetic algorithm on a low-power controller
CN116681104B (en) Model building and realizing method of distributed space diagram neural network
CN112149883A (en) Photovoltaic power prediction method based on FWA-BP neural network
CN109858798B (en) Power grid investment decision modeling method and device for correlating transformation measures with voltage indexes
CN111625399A (en) Method and system for recovering metering data
CN111832825A (en) Wind power prediction method and system integrating long-term and short-term memory network and extreme learning machine
CN103957582A (en) Wireless sensor network self-adaptation compression method
CN112651519A (en) Secondary equipment fault positioning method and system based on deep learning theory
CN114781875B (en) Micro-grid economic operation state evaluation method based on deep convolution network
CN113139570A (en) Dam safety monitoring data completion method based on optimal hybrid valuation
CN108921287A (en) A kind of optimization method and system of neural network model
CN116992779A (en) Simulation method and system of photovoltaic energy storage system based on digital twin model
CN111932091A (en) Survival analysis risk function prediction method based on gradient survival lifting tree
CN111400964B (en) Fault occurrence time prediction method and device
CN116680969A (en) Filler evaluation parameter prediction method and device for PSO-BP algorithm
CN107273971A (en) Architecture of Feed-forward Neural Network self-organizing method based on neuron conspicuousness
CN116896093A (en) Online analysis and optimization method for grid-connected oscillation stability of wind farm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant