CN111767951A - Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis - Google Patents

Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis Download PDF

Info

Publication number
CN111767951A
CN111767951A CN202010602460.7A CN202010602460A CN111767951A CN 111767951 A CN111767951 A CN 111767951A CN 202010602460 A CN202010602460 A CN 202010602460A CN 111767951 A CN111767951 A CN 111767951A
Authority
CN
China
Prior art keywords
abnormal
data
electricity utilization
isolated
resident
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010602460.7A
Other languages
Chinese (zh)
Inventor
周浩
胡炳谦
顾一峰
韩俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ieslab Energy Technology Co ltd
Original Assignee
Shanghai Ieslab Energy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ieslab Energy Technology Co ltd filed Critical Shanghai Ieslab Energy Technology Co ltd
Priority to CN202010602460.7A priority Critical patent/CN111767951A/en
Publication of CN111767951A publication Critical patent/CN111767951A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The potential community electricity utilization safety problem is paid more and more attention in the current urbanization management, how to rapidly identify the potential electricity utilization safety problem existing in an electricity utilization unit, and the demand of better managing community energy utilization through big data and an intelligent method is increased in recent years. The resident electricity utilization safety analysis aims at collecting a large amount of resident electricity utilization load data through terminal equipment such as an intelligent electric meter, and then carrying out abnormal value monitoring, cluster analysis, time series and other methods to obtain electricity utilization habits of different residents, and discovering electricity utilization abnormal conditions. The invention discloses a method for discovering abnormal data by applying an isolated forest algorithm to resident electricity load data in resident electricity utilization safety analysis, which aims to report abnormal behaviors in resident electricity utilization and early warn abnormal electricity utilization conditions in communities.

Description

Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
Technical Field
The invention relates to the technical field of electric power safety analysis, in particular to a method for finding abnormal data by applying an isolated forest algorithm in residential electricity load data analysis.
Background
In recent years, with the further deepening of innovation and opening, the number of enterprises is greatly increased, the living quality of residents is greatly improved, so that a new round of electricity utilization is increased, and the safety problem of community electricity utilization is further highlighted. The potential community electricity utilization safety problem is paid more and more attention in the current urbanization management, how to rapidly identify the potential electricity utilization safety problem existing in an electricity utilization unit, and the demand of better managing community energy utilization through big data and an intelligent method is increased in recent years. In group renting in the community, the condition of the industrialized application of the residential electricity consumption is infinite, and the residential electricity consumption safety analysis of habit analysis can be used for discovering the violation condition of the city manager at the first time of abnormity occurrence by imaging the residential electricity consumption. The resident electricity utilization safety analysis aims at collecting a large amount of resident electricity utilization load data through terminal equipment such as an intelligent electric meter, and then carrying out abnormal value monitoring, cluster analysis, time series and other methods to obtain electricity utilization habits of different residents, and discovering electricity utilization abnormal conditions. The invention discloses a method for discovering abnormal data by applying an isolated forest algorithm to resident electricity load data in resident electricity utilization safety analysis, which aims to report abnormal behaviors in resident electricity utilization and early warn abnormal electricity utilization conditions in communities.
Disclosure of Invention
The invention provides a method for screening abnormal data of resident electricity load data based on an isolated forest algorithm, which is characterized by having a function of finding abnormal data and reporting the abnormal data by applying the isolated forest algorithm.
The isolated forest algorithm is a machine learning algorithm for anomaly detection, is an unsupervised learning algorithm, and is used for identifying anomalies through outliers in isolation data based on a decision tree algorithm. Outliers are isolated by randomly selecting features from a given set of features and then randomly selecting a segmentation value between the maximum and minimum values of the features. This random division of features makes the paths that the outlier data points generate in the tree shorter, separating them from other data. In solitary forest, an anomaly is defined as "outlier that is easily isolated", which can be understood as a point that is sparsely distributed and is far from a population with high density. In the feature space, sparsely distributed regions indicate that events have a low probability of occurring in the regions, and thus data falling in these regions can be considered abnormal. Isolated forest is an anomaly detection method suitable for continuous data, i.e. marked samples are not needed for training, but features need to be continuous. The isolated forest algorithm uses a set of very efficient strategies on how to find which points are easily isolated. In solitary forest, the data set is recursively randomly partitioned until all sample points are isolated. Under this strategy of random segmentation, outliers typically have shorter paths. Statistically, if there are only sparsely distributed points in a region in the data space, the probability that the data point falls in the region is very low, and therefore, the points in the regions can be considered as abnormal. Intuitively, the clusters with high density need to be cut many times to be isolated, but the points with low density can be easily isolated and considered as outliers.
The actually collected historical data of the electricity load of the residents can be calculated and analyzed through an isolated forest, the abnormal value of electricity utilization can be found, the abnormal occurrence time can be found through the timestamp, and the high-efficiency management on the electricity utilization safety of the community can be realized by locking the electricity utilization residents and the abnormal electricity utilization behaviors.
Drawings
Fig. 1 is a schematic processing flow diagram of a method for removing abnormal data and denoising historical load data in the embodiment of the invention.
FIG. 2 is a schematic diagram of a process for cutting a sub-sample according to an embodiment of the present invention.
Detailed Description
In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the protection scope of the present invention. As shown in fig. 1, the present invention involves the following steps.
The first step,Data preprocessing:the collected original historical power load historical data are arranged according to a time sequence, the start and stop time of a data set is determined, and a data timestamp and the serial number of the electricity using residents are marked.
Step two,Removing abnormal values by an isolated forest algorithm:and (4) marking a timestamp on the historical power load data preprocessed in the first step, and inputting the serial number of the residents into the isolated forest algorithm model. Firstly, training a single tree on data:
1) randomly selecting n points from training data as subsamples, and putting the subsamples into a root node of an isolated tree;
2) randomly appointing a dimension, and randomly generating a cutting point p in the range of the current node data, wherein the cutting point is generated between the maximum value and the minimum value of the appointed dimension in the current node data;
3) the selection of the cut point generates a hyperplane, which divides the current node data space into 2 subspaces: placing points smaller than p in the currently selected dimension on the left branch of the current node, and placing points larger than or equal to p on the right branch of the current node;
recursion steps 2) and 3) on the left branch node and the right branch node of the node, and continuously constructing new leaf nodes until only one piece of data (cutting can not be continued) is on the leaf nodes or the tree grows to the set height;
FIG. 2 shows the process of training the cutting of the sub-samples, wherein Xi of the left image is in a region with a higher density, so that the left image is cut ten or more times and is divided into separate subspaces, and Xo of the right image falls in a region with sparsely distributed edges and is "isolated" after only four cuts;
the results of all the isolated trees are integrated after the isolated trees are respectively calculated, and since the cutting process is completely random, the results need to be converged by using a set method, namely, cutting is repeatedly started from the beginning, and then the average value of the results of each cutting is calculated. After t isolated trees are obtained, the training of a single tree is finished. The test data can then be evaluated using the generated orphan tree, i.e., an anomaly score s is calculated. For each sample x, the results for each tree need to be computed in combination, and the anomaly score is computed by the following formula:
Figure 48589DEST_PATH_IMAGE001
h (x) is the height of x in each tree, c (Ψ) is the average of the path lengths at a given number of samples Ψ, and is used to normalize the path length h (x) of sample x;
analyzing the calculated abnormal score, wherein if the abnormal score is close to 1, the abnormal score must be an abnormal point; if the anomaly score is much less than 0.5, then it must not be an anomaly point; if the scores of all points for an outlier are around 0.5, then there is a high probability that an outlier is not present in the sample. And counting the abnormal score of each data point of the historical load data, and setting different thresholds to tighten or loosen the abnormal value removing conditions to remove the abnormal value according to the expected effect. The rejected abnormal value is marked according to the time stamp and the resident serial number and is input to the next step to supplement the missing value.
Step three,Abnormal resident electricity consumption timestamp mark: and marking abnormal electricity utilization conditions and positioning electricity utilization residents according to the resident sequence number and the timestamp one-to-one correspondence of the abnormal electricity utilization load data selected by the isolated forest algorithm.
The invention provides a method for screening abnormal data of residential electricity load data, which is characterized in that the abnormal data is found by applying an isolated forest algorithm and the abnormal data is reported, the occurrence time and place of abnormal electricity consumption behaviors of residents in a community are accurately positioned, the labor input and time cost for checking abnormal electricity consumption are reduced, the efficiency of community safety electricity consumption management is improved, and the method has wide application space in the field of electricity consumption safety management which shows more and more importance. Outliers in the isolated forest isolated data points, rather than analyzing normal data points. Compared with other normal data points, the tree path of the abnormal data points is shorter, so that the tree in the solitary forest does not need too much depth, and the method has the advantages of low memory requirement, high calculation speed and the like. By applying the invention, the efficiency of power utilization safety management can be greatly improved.

Claims (1)

1. The invention discloses a method for discovering abnormal data by applying an isolated forest algorithm in resident electricity safety analysis, which is characterized by comprising the following steps of:
the first step,Data preprocessing:arranging the collected historical data of the original historical power load according to a time sequence, determining the starting and stopping time of a data set, and marking a data timestamp and the serial number of a power consumer;
step two,Removing abnormal values by an isolated forest algorithm:marking a timestamp on the historical power load data preprocessed in the first step, and inputting the serial number of residents into an isolated forest algorithm model;
firstly, training a single tree on data:
1) randomly selecting n points from training data as subsamples, and putting the subsamples into a root node of an isolated tree;
2) randomly appointing a dimension, and randomly generating a cutting point p in the range of the current node data, wherein the cutting point is generated between the maximum value and the minimum value of the appointed dimension in the current node data;
3) the selection of the cut point generates a hyperplane, which divides the current node data space into 2 subspaces: placing points smaller than p in the currently selected dimension on the left branch of the current node, and placing points larger than or equal to p on the right branch of the current node;
recursion steps 2) and 3) on the left branch node and the right branch node of the node, and continuously constructing new leaf nodes until only one piece of data (cutting can not be continued) is on the leaf nodes or the tree grows to the set height;
FIG. 2 shows the process of training the cutting of the sub-samples, wherein Xi of the left image is in a region with a higher density, so that the left image is cut ten or more times and is divided into separate subspaces, and Xo of the right image falls in a region with sparsely distributed edges and is "isolated" after only four cuts;
integrating the results of all the isolated trees after respectively calculating the isolated trees, and because the cutting process is completely random, a set method is needed to make the results converge, namely, cutting is repeatedly started from the beginning, and then the average value of each cutting result is calculated;
after t isolated trees are obtained, training of a single tree is finished, and then the generated isolated trees can be used for evaluating test data, namely calculating an abnormal score s, for each sample x, calculating the result of each tree comprehensively, and calculating the abnormal score through the following formula:
Figure 444464DEST_PATH_IMAGE001
h (x) is the height of x in each tree, c (Ψ) is the average of the path lengths at a given number of samples Ψ, and is used to normalize the path length h (x) of sample x; analyzing the calculated abnormal score, wherein if the abnormal score is close to 1, the abnormal score must be an abnormal point; if the anomaly score is much less than 0.5, then it must not be an anomaly point; if the scores of all the points of the abnormal score are about 0.5, the abnormal point is probably not present in the sample; counting the abnormal score of each data point of the historical load data, and eliminating the abnormal value according to the expected effect by setting different thresholds and tightening or loosening the abnormal value elimination condition; marking the removed abnormal value according to the timestamp and the resident serial number and inputting the marked abnormal value to the next step to supplement the missing value;
step three,Abnormal resident electricity consumption timestamp mark: and marking abnormal electricity utilization conditions and positioning electricity utilization residents according to the resident sequence number and the timestamp one-to-one correspondence of the abnormal electricity utilization load data selected by the isolated forest algorithm.
CN202010602460.7A 2020-06-29 2020-06-29 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis Pending CN111767951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010602460.7A CN111767951A (en) 2020-06-29 2020-06-29 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010602460.7A CN111767951A (en) 2020-06-29 2020-06-29 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis

Publications (1)

Publication Number Publication Date
CN111767951A true CN111767951A (en) 2020-10-13

Family

ID=72722361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010602460.7A Pending CN111767951A (en) 2020-06-29 2020-06-29 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis

Country Status (1)

Country Link
CN (1) CN111767951A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381610A (en) * 2020-11-16 2021-02-19 国网上海市电力公司 Prediction method of group lease risk index and computer equipment
CN113125903A (en) * 2021-04-20 2021-07-16 广东电网有限责任公司汕尾供电局 Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN113187650A (en) * 2021-04-07 2021-07-30 武汉四创自动控制技术有限责任公司 Intelligent hydraulic power plant whole-plant hydraulic turbine speed regulation system and diagnosis method
CN114124482A (en) * 2021-11-09 2022-03-01 中国电子科技集团公司第三十研究所 Access flow abnormity detection method and device based on LOF and isolated forest
CN116451168A (en) * 2023-06-15 2023-07-18 北京国电通网络技术有限公司 Abnormal power information generation method, device, electronic equipment and readable medium
CN116911806A (en) * 2023-09-11 2023-10-20 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN117913996A (en) * 2024-01-24 2024-04-19 江苏同合电气有限公司 Intelligent monitoring management method and system for operation of power distribution cabinet based on data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2720406A1 (en) * 2012-10-10 2014-04-16 Thomson Licensing Method for isolated anomaly detection in large-scale data processing systems
CN108320063A (en) * 2018-03-26 2018-07-24 上海积成能源科技有限公司 To the method for rejecting abnormal data and denoising in a kind of load forecast
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109308306A (en) * 2018-09-29 2019-02-05 重庆大学 A kind of user power utilization anomaly detection method based on isolated forest
CN110189232A (en) * 2019-05-14 2019-08-30 三峡大学 Power information based on isolated forest algorithm acquires data exception analysis method
CN111177208A (en) * 2019-10-18 2020-05-19 姚长征 Power consumption abnormity detection method based on big data analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2720406A1 (en) * 2012-10-10 2014-04-16 Thomson Licensing Method for isolated anomaly detection in large-scale data processing systems
CN108320063A (en) * 2018-03-26 2018-07-24 上海积成能源科技有限公司 To the method for rejecting abnormal data and denoising in a kind of load forecast
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109308306A (en) * 2018-09-29 2019-02-05 重庆大学 A kind of user power utilization anomaly detection method based on isolated forest
CN110189232A (en) * 2019-05-14 2019-08-30 三峡大学 Power information based on isolated forest algorithm acquires data exception analysis method
CN111177208A (en) * 2019-10-18 2020-05-19 姚长征 Power consumption abnormity detection method based on big data analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
风控大鱼: "异常检测算法-孤立森林(Lsolation Forest)剖析", 《知乎》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381610A (en) * 2020-11-16 2021-02-19 国网上海市电力公司 Prediction method of group lease risk index and computer equipment
CN113187650A (en) * 2021-04-07 2021-07-30 武汉四创自动控制技术有限责任公司 Intelligent hydraulic power plant whole-plant hydraulic turbine speed regulation system and diagnosis method
CN113125903A (en) * 2021-04-20 2021-07-16 广东电网有限责任公司汕尾供电局 Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN114124482A (en) * 2021-11-09 2022-03-01 中国电子科技集团公司第三十研究所 Access flow abnormity detection method and device based on LOF and isolated forest
CN114124482B (en) * 2021-11-09 2023-09-26 中国电子科技集团公司第三十研究所 Access flow anomaly detection method and equipment based on LOF and isolated forest
CN116451168A (en) * 2023-06-15 2023-07-18 北京国电通网络技术有限公司 Abnormal power information generation method, device, electronic equipment and readable medium
CN116451168B (en) * 2023-06-15 2023-09-12 北京国电通网络技术有限公司 Abnormal power information generation method, device, electronic equipment and readable medium
CN116911806A (en) * 2023-09-11 2023-10-20 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN116911806B (en) * 2023-09-11 2023-11-28 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN117913996A (en) * 2024-01-24 2024-04-19 江苏同合电气有限公司 Intelligent monitoring management method and system for operation of power distribution cabinet based on data analysis
CN117913996B (en) * 2024-01-24 2024-06-07 江苏同合电气有限公司 Intelligent monitoring management method and system for operation of power distribution cabinet based on data analysis

Similar Documents

Publication Publication Date Title
CN111767951A (en) Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
CN106101121B (en) A kind of all-network flow abnormity abstracting method
CN111666276A (en) Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction
CN105677791A (en) Method and system used for analyzing operating data of wind generating set
CN104698343A (en) Method and system for judging power grid faults based on historical recording data
CN114201374B (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN111191720B (en) Service scene identification method and device and electronic equipment
CN115865649B (en) Intelligent operation and maintenance management control method, system and storage medium
CN112101420A (en) Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model
CN110580492A (en) Track circuit fault precursor discovery method based on small fluctuation detection
CN110297207A (en) Method for diagnosing faults, system and the electronic device of intelligent electric meter
CN111444501B (en) LDoS attack detection method based on combination of Mel cepstrum and semi-space forest
CN110602105A (en) Large-scale parallelization network intrusion detection method based on k-means
CN112395608A (en) Network security threat monitoring method, device and readable storage medium
CN111600878A (en) Low-rate denial of service attack detection method based on MAF-ADM
CN111666978B (en) Intelligent fault early warning system for IT system operation and maintenance big data
CN111506635A (en) System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm
CN118094531B (en) Safe operation and maintenance real-time early warning integrated system
CN113284004A (en) Power data diagnosis treatment method based on isolated forest algorithm
CN106846170B (en) Generator set trip monitoring method and monitoring device thereof
CN116302809A (en) Edge end data analysis and calculation device
CN113127464B (en) Agricultural big data environment feature processing method and device and electronic equipment
CN117277566A (en) Power grid data analysis power dispatching system and method based on big data
CN111506636A (en) System and method for analyzing residential electricity consumption behavior based on autoregressive and neighbor algorithm
CN115062007A (en) Wind turbine generator set wind speed and power data cleaning method based on isolated forest algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201013