CN110189232A - Power information based on isolated forest algorithm acquires data exception analysis method - Google Patents

Power information based on isolated forest algorithm acquires data exception analysis method Download PDF

Info

Publication number
CN110189232A
CN110189232A CN201910399385.6A CN201910399385A CN110189232A CN 110189232 A CN110189232 A CN 110189232A CN 201910399385 A CN201910399385 A CN 201910399385A CN 110189232 A CN110189232 A CN 110189232A
Authority
CN
China
Prior art keywords
data
line loss
power information
class
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910399385.6A
Other languages
Chinese (zh)
Inventor
马辉
韩笑
鲁海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN201910399385.6A priority Critical patent/CN110189232A/en
Publication of CN110189232A publication Critical patent/CN110189232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

Power information based on isolated forest algorithm acquires data exception analysis method, establishes the platform area Controlling line loss index based on power information acquisition system, formulates the platform area Controlling line loss method based on power information acquisition system.For the area line loss Lei Tai, the acquisition of the area multiple line loss Lei Tai power information data, classification, processing are realized using cloud storage technology;The type of analysis and summary dirty data eliminates noise according to its form of expression, removes dirty data;By data transformation by the data after cleaning screening, the form conducive to data mining is converted to;Data Analysis Model is established using isolated forest algorithm, and using Receiver Operating Characteristics' ROC curve and area under the curve AUC and accumulation recall curve and P-R curve, carry out model evaluation, and this model is applied on the area multiple line loss Lei Tai power information data set, data mining is carried out to the data after screening, screens multiplexing electric abnormality user.The present invention analyzes line loss reason using the isolated effective mining data abnormal user of forest algorithm, reinforces platform area Controlling line loss.

Description

Power information based on isolated forest algorithm acquires data exception analysis method
Technical field
The present invention relates to power information acquisition technique field, specifically a kind of power information based on isolated forest algorithm is adopted Collect data exception analysis method.
Background technique
With the rapid development of information age, take the lead in big data correlative study is unfolded is internet, information communication row Industry.For power industry, big data similarly has far-reaching research significance and bright application prospect.With next-generation electricity Force system gradually evolution will gradually replace traditional power supply chain based on the power supply chain of data-driven.Wherein use telecommunications The popularization of acquisition system is ceased, carries out the managed operation decision analyzed based on electric power data for China's power industry and electric service is excellent Change provides necessary data basis.Simultaneously as the electricity consumption datas such as energy data, floor data, event information exponentially increase Long, big data feature is more and more significant, and the application demand of electricity consumption big data is increasingly urgent to.The electricity consumption data of magnanimity is mainly derived from All kinds of metering devices and system occur a large amount of due to plurality of devices failure, communication failure, power network fluctuation and management etc. Abnormal electricity consumption data.In face of the increase of this magnanimity electricity consumption data, most power departments be used only traditional statistical method into The analysis of row abnormal data, and need to rely on field test mostly to realize.It is different due to being limited by human and material resources, financial resources The profound cause that regular data is hidden behind can not be refined effectively, and " data disaster " and " data are fallen into disuse " are but brought.Cause This, with traditional analysis means, oneself is difficult to meet the requirements, and needs to find that electricity consumption data is extremely deeper by data mining Rule excludes the contingency of data, refines the certainty of data.
Since low-voltage client Population is huge, and change frequently, generally existing family, which becomes, in the Controlling line loss of mesh foreground partition closes It is the exception of line loss caused by the administrative reasons such as unclear, meter reading quality is bad, stealing, metering fault.In recent years, domestic many power supplies Enterprise it is different degrees of face a common awkward situation, i.e., on administering platform area line loss " investment is big, return is small ", root is close Since 10 years, the principal element for influencing platform area line loss has turned to managerial loss, and investment for trnasforming urban land direction is constant.Electricity consumption Intelligentized management philosophy is introduced platform area by information acquisition system construction, brings the opportunity of innovation to platform area Controlling line loss.
Cloud computing technology can provide the high quality clothes of distribution according to need by utilizing distributed software and hardware resources and information Business, and be successfully applied in the various fields such as search engine, social networks, communication.It is built in smart grid informationization If field, the unique large-scale data of cloud computing efficiently accesses and computation capability, enable be include power information Information system including acquisition system provides the data processing service of high quality, and the Information System for the smart grid epoch provides Solid technical support.
Summary of the invention
The present invention provides a kind of power information acquisition data exception analysis method based on isolated forest algorithm, and realization is based on The application continuum that storage platform technology blends when real-time data base and cloud computing, mysorethorn, uses efficient parallel computation skill The high-throughput of art realization big data batch processing task.Good using stability, the strong isolated forest algorithm of noiseproof feature is effectively dug Data exception user is dug, line loss reason is analyzed, reinforces platform area Controlling line loss.
The technical scheme adopted by the invention is as follows:
Power information based on isolated forest algorithm acquires data exception analysis method, comprising the following steps:
Step 1: establishing the platform area Controlling line loss index based on power information acquisition system, and formulation is adopted based on power information The platform area Controlling line loss method of collecting system.
Step 2: being directed to the area line loss Lei Tai, realizes that the area multiple line loss Lei Tai power information data are adopted using cloud storage technology Collection, classification, processing;
Step 3: the type of analysis and summary dirty data eliminates noise according to its form of expression;
Step 4: by data transformation by the data after cleaning screening, the form conducive to data mining is converted to, i.e., Dimensionality reduction is carried out to data.For the similitude sufficiently reflected between load, the present invention chooses 6 kinds of common daily load characteristic index: negative Lotus rate, peak-valley ratio, highest utilize hour rate, peak phase load factor, flat phase load factor, paddy phase load factor, are comprehensively reflected each The use electrical characteristics of class user, logarithm factually existing effect dimensionality reduction.
Step 5: the isolated forest algorithm of application establishes Data Analysis Model, and application Receiver Operating Characteristics' ROC curve with Area under the curve AUC, P-R curve carries out model evaluation, and this model is applied to the area multiple line loss Lei Tai power information data On collection, data mining is carried out to the data after screening, screens multiplexing electric abnormality user.
Receiver Operating Characteristics' ROC curve: when the changes in distribution of the positive negative sample in test set, ROC curve can be protected It holds constant.For the serial number of binary classification model output, the sample that will be greater than threshold value divides positive class into, less than the sample of threshold value Then divide negative class into.Reducing threshold values no doubt can recognize that more positive classes, that is, also can be by more negative samples when improving recall ratio It divides positive class into, that is, improves rate of false alarm.ROC curve visualizes this change procedure.In ROC space coordinate, point (0,1) is indicated Ideal sort device, ROC curve are better closer to point (0,1) presentation class effect.The numerical value of AUC is exactly ROC curve section below The size of area, AUC=1 corresponding ideal classifier, AUC=0.5 are represented as random guess, and model is not previously predicted value, It is represented between 0.5 to 1 and is better than random guess.
P-R curve: being horizontal axis mapping by the longitudinal axis, recall ratio of precision ratio, just obtain the curve of precision ratio and recall ratio, letter Referred to as " P-R curve " changes from big to small with classification thresholds, and precision ratio reduces, and recall ratio increases, when classification of assessment device, P-R Curve is better closer to point (1,1) presentation class effect.
In step 1, the platform area Controlling line loss index of foundation includes covering class, family change class, can adopt class, data class, line loss class Five kinds of status indicators and its hierarchical relationship;Following status indicator, needle are carried out to the area Duo Tai according to collected power load data Corresponding management and control measures are formulated to the control emphasis in different type platform area, to form the platform area based on power information acquisition system Controlling line loss method, concrete measure are as follows:
Covering class: equipment installation rate is acquired in platform area and is not up to 100%, reasonable arrangement is answered to acquire equipment erection schedule;
Family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate, should be consulted reference materials by interior The mode of scene combination is looked into outside, is checked and approved family and is become relationship;
Can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%, the rate that can adopt, analysis leakage should be counted The reason of adopting, accidentally adopting;
Data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired and people Work meter reading data error is greater than mean value, formulates reasonable meter reading plan;
Line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and to become relationship correct at family, but line loss per unit is abnormal, Line loss per unit abnormal cause should be analyzed in time, formulate reducing loss measure;
In step 2, using the distributed document memory mechanism of cloud storage, the dispersion of power information data is stored in more In independent storage server, it includes volume management, metadata management, block data management service;
Metadata refers to title, attribute, the data block location information of file, and because metadata access is frequent, therefore system will be first Data load caching is managed into memory, improves access efficiency.
Block number according to multiple data blocks that file data is split to form according to a certain size are referred to, arrive different by distribution storage On memory node server, the memory space as provided by a pair of of meta data server and its storage server node of management claims For a volume space;
Volume management server is responsible for multiple volume virtualization integrations, and it is flat externally to provide unified whole access cloud real-time storage Platform space.
Storage platform system uses parallel ETL (Extraction-Transformation-Loading) environment when mysorethorn, Original computation-intensive complex task, atomicity decomposition is carried out, is assigned on different task processing nodes, carries out concurrent same Step processing, improves data-handling efficiency and data handling capacity, guarantees data processing performance.
In step 3, common dirty data type has:
(1) missing values: being null value in table
(2) repetition values: a certain moment power load Data duplication of user
(3) Min-max: power load data are excessive or too small
(4) it load burr: is increased or reduced suddenly between adjacent time interval data
(5) negative value is impacted: readings decline in certain continuous period
According to characteristic period of waves of power load, vacancy value is filled, calculation method is as follows:
In formula, XiIndicate the power load at current time, at the time of i is load shortage of data, value 1-24, α1And α2 The weighting coefficient of former and later two time point loads of two days corresponding moment and current time before and after table
Noise data is repaired using Rectangular Method, calculation method is as follows:
In formula, XiFor electricity reparation value, F is intraday load data times of collection, PiFor the load data at i moment, △ T is load data collection interval.
In step 5, outlier is excavated using isolated forest algorithm, helps to lock user's exception suspicion automatically, is realized different The preliminary screening at common family, improves investigation rate.The algorithm by with a random hyperplane come cutting data space, every cutting Two sub-spaces once can be generated, be further continued for cutting every sub-spaces with a random hyperplane later, circulation is gone down, directly Until only one data point inside every sub-spaces.
Isolated forest algorithm, is a two-phase algorithm:
First stage, the isolated forest of building t iTree composition, implementation step are as follows:
(1), ψ sample points are randomly choosed from training data as subsample collection, are put into the root node of tree.
(2), it is randomly assigned a dimension, a cut point P is randomly generated in present node data.
(3), a hyperplane is generated with this cut point, present node data space is divided into 2 sub-spaces: specified Data in dimension less than P are placed on the left side of present node, the data more than or equal to p are placed on the right of present node.
(4), recursion step (1) and (2) in child node, constantly construct new child node, until data itself can not be again Divide or the depth of tree reaches log2ψ。
Second stage assesses test data with the iForest of generation, calculates abnormal score to detected sample.For Any data x enables it traverse each iTree, obtains depth of the x locating for iTree and the average depth locating for every iTree It spends h (x), to calculate the abnormal score of sample.The abnormal score for being detected sample x is defined as follows shown in formula:
Wherein: h (x) is the depth for being detected the node that sample x is retrieved in iTree;E (h (x)) is to all t ITree takes mean value;C (ψ) is the average path length of the binary search tree of ψ point building;H (k)=ln (k)+ζ, ζ are that Euler is normal Number.
Observe the definition of abnormal score, it is known that: when E (h (x)) → 0, s → 1;When E (h (x)) → ψ -1, s → 0;As E (h (x))→c(ψ),s→0.5.That is s (x) closer to 1 indicate abnormal data a possibility that it is high, closer to 0 indicate be normal point can Energy property is relatively high.
A kind of power information based on isolated forest algorithm of the present invention acquires data exception analysis method, and technical effect is such as Under:
1: the platform area Controlling line loss method based on power information acquisition system, as a kind of new adaptation smart grid development The problem of platform Controlling line loss method of demand, effective solution current unit-area management, the area Shi Tai Controlling line loss is more It is transparent, efficient, its integrated management effect in marketing management is played, finally realizes the target of saving energy and decreasing loss, standardized administration.
2: there are five types of states altogether for the platform area Controlling line loss index system of foundation: covering class, family become class, can adopt class, data class, Five kinds of status indicators of line loss class and its hierarchical relationship.The platform area of different conditions is formulated different according to different control emphasis Management-control method, control period and responsible department, it is final that platform area is pushed to realize that good state is progressive.
3: the present invention is sufficiently combined power information acquisition system with platform area line loss analyzing platform, effective solution line loss The problem of management.Data mining research and analysis can be carried out extremely for the line loss of platform area electricity system, make line loss Management is more transparent, efficient, can play its integrated management application, finally realizes the target of saving energy and decreasing loss, standardized management.
Detailed description of the invention
Fig. 1 is the platform area line loss control progressive schematic diagram of index system state based on power information acquisition system.
Fig. 2 is power information acquisition system general frame schematic diagram.
Fig. 3 is iForest rejecting outliers procedure chart.
Fig. 4 is iTree construction flow chart.
Fig. 5 is the construction of iForest and the output flow chart of abnormal score.
Fig. 6 is iForest model inspection ROC curve figure.
Fig. 7 is iForest model inspection P-R curve graph.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and embodiments.
By taking certain Utilities Electric Co., county as an example, the platform area Controlling line loss instance analysis based on power information acquisition system is carried out.
Use Enterprise SOA system as overall design philosophy frame in the exploitation design of the system.It mainly includes 3 functional modules below: data acquisition and real time data library module, abnormal data analysis module, decreasing loss aid decision module.
Data acquisition uses the special transformer terminals of 05 edition specification with real time data library module, can be every 15 minutes acquisition users The voltage and current and electricity data of electric energy meter (24 hours totally 96 points), i.e. data set S are the n*24 rank that n daily load curve is constituted Initial load curve matrix.Collected mass data is realized distributed storage by cloud storage technology by the module.Data warp It is obtained after processing: in September, 2018 to shared 701, platform area, in March, 2019 Utilities Electric Co., county, 34.9 ten thousand KVA of platform area total capacity, Average individual capacity 497.8KVA, 4.6 ten thousand Kwh of accumulating losses electricity, average platform area line loss per unit 2.69%.
The isolated forest algorithm of abnormal data analysis module application excavates outlier, to find out multiplexing electric abnormality user.The mould Block is broadly divided into 4 stages:
(1), collected electricity consumption data is cleaned, the type of analysis and summary dirty data, further according to its form of expression Targetedly means are taken, the redundant data in data set is deleted, keeps the integrality of data set.For lacking serious data According to characteristic period of waves of power load, load and the front and back at current time of the adjacent point of same time on the two in front and back are calculated The mean value of the load at two time points and latter day with respect to the load changing rate method of proxima luce (prox. luc), add load variations amount with mean value Vacancy value is filled, calculation method is as follows:
In formula, XiIndicate the power load at current time, at the time of i is load shortage of data, value 1-24, α1And α2 Indicate the weighting coefficient of former and later two time point loads of two days corresponding moment and current time of front and back.It counts for abnormal noise According to the load data for respectively acquiring the moment to the same day using Rectangular Method carries out the reparation value of integral calculation electricity, and calculation formula is such as Shown in lower:
In formula, XiFor electricity reparation value, F is intraday load data times of collection, PiFor the load data at i moment, △ T is load data collection interval, and this example takes △ T=15min.
(2), Data Dimensionality Reduction: for the load curve as time series, power load data are vulnerable to temperature, receipts Enter, many factors such as electrovalence policy influence, these influence the internal characteristics of result as time series data, can not be obtained by distance Sufficiently reflection, cannot be completely secured the form of time series or the similitude of profile.Also, it is this kind of for daily load curve have it is bright The curve of aobvious load shape, undesirable isometry can be shown in higher-dimension.For the similitude sufficiently reflected between load, Operation efficiency is taken into account, the present embodiment has chosen 6 kinds of common daily load characteristic index: rate of load condensate, peak-valley ratio, highest utilize small When rate, peak phase load factor, flat phase load factor, paddy phase load factor, from 4 whole day, peak phase, flat phase, Gu Qi angles, more comprehensively Reflect all types of user uses electrical characteristics.Feature Dimension Reduction is carried out to Payload curve matrix using 6 daily load characteristic index.
(3), iTree is constructed, process is as follows:
1., in 6 daily load characteristic index randomly choose a feature;
2., random selection this feature a value k;
3., classified to every record according to feature, the record that k is less than in feature is placed on left branch, greater than etc. Right branch is placed in the record of k;
4., then recurrence Construction left branch and right branch, until meeting the following conditions:
A, incoming data set only has a record or a plurality of the same record;
B, the height of tree has reached restriction height
Since abnormal data wants much less relative to normal data, and with the feature of normal data compared to significantly different, Its path length is also relatively low, therefore other than limited samples size, also depth capacity h=is arranged to every iTree log2ψ, it is only necessary to which concern is lower than the part of mean depth, and such efficiency of algorithm is higher.Take herein the hits ψ of i-Tree= 100, i-Tree quantity n=100.
(4), it constructs isolated forest to carry out model evaluation and calculate data exception score value: there is otherness by several ITree constitutes iForest, and carries out model evaluation with ROC curve and AUC and accumulation recall curve and P-R curve. IForest every time can only evaluate single user.All iTree are needed to be traversed in each evaluation procedure.Statistical query pair As the position of the leaf node fallen in, abnormal score is calculated by its average path length.Finally according to the size of abnormal score User is evaluated, judges whether user to be measured is abnormal user.
Decreasing loss aid decision module mainly includes that decreasing loss decision support function and decreasing loss scheme base manage two parts, should Module is checked for electricity consumption data abnormal user, pays close attention to the following contents: with the presence or absence of electricity stealing in a, platform area;b, The variation of platform area load operation, whether there is or not cut to change;C, platform area transformer whether be lightly loaded or heavy duty;D, reactive-load compensation equipment operating condition; E, whether three-phase load balances;F, quality of voltage;G, whether transformer, route, measuring equipment are reasonable, normal;Cause electric energy meter The reason of amount device exception mainly has meter failure, mulual inductor malfunction, terminal box failure and terminal fault etc. h, low pressure to supply Whether electric radius is too long;I, other reasons cause line loss abnormal.
Fig. 1 is that the platform area line loss based on power information acquisition system manages the progressive schematic diagram of index system state.
The initial platform area 1.1-: data preparation is included in the progressive management of level.
1.2- installs the area Lei Tai: the installation of reasonable arrangement equipment, coverage rate reach 100%.
The area 1.3- Hu Bianleitai: it verifies the family Tai Qu and becomes relationship, accuracy rate reaches 100%.
1.4- can the area Cai Leitai: multi collect analyzes failure, can the rate of adopting reach 95%.
1.5- data class platform area: multi collect, analytical error, can the rate of adopting reach 95%.
The area 1.6- line loss Lei Tai: analysis line loss per unit abnormal cause formulates reducing loss measure.
The up to standard area 1.7-: taking admittedly excellent measure, keeps mesa-shaped state up to standard.
Status indicator is carried out with platform area control emphasis, shares following five kinds of states:
A, it covers class: acquiring equipment installation rate in platform area and be not up to 100%, control emphasis is reasonable arrangement acquisition equipment peace Dress plan accelerates dress swap-in degree, improves coverage rate.
B, family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate, and control emphasis is knot Close can adopt situation analysis family become relationship confusion range, it is interior consult reference materials outside look into scene combine by way of, check and approve family change relationship.
C, can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%, control emphasis is that statistics can Rate is adopted, the reason of adopting, accidentally adopting is leaked in analysis, and raising of adopting an effective measure can adopt rate.
D, data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired with Manual metering data error is greater than mean value, and control emphasis is to formulate reasonable meter reading plan, and multiple meter reading, repeatedly verification is manually copied The data of table and acquisition meter reading are analyzed exception and are rectified and improved.
E, line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and family becomes that relationship is correct, but line loss per unit is different Often, control emphasis is analysis line loss per unit abnormal cause, finds target making reducing loss measure accurately.
Fig. 2 is power information acquisition system general frame.Acquisition cluster periodically acquires information from user terminal, and By calling memory interface to store data into cloud storage and inquiry environment;Data storage and inquiry environment are responsible to collecting Information carry out high concurrent storage, and upwards provide electricity consumption data index and efficient query function.Parallel ETL environment is responsible for original There is the data exchange of archive information and cloud computing environment in relevant database;Tables of data is established using ETL management tool to map The implementation strategy of relationship and task, system carry out real-time tracking to the data in interconnected system by parallel ETL tool, obtain And consistency desired result.Parallel parsing and calculating environment are responsible for running isolated forest algorithm excavation abnormal data.Front end interface includes Class SQL (Structured Query Language) interface, Web service, client packet etc., facing external system provide inquiry With the service of analytical calculation.Mapping tool uses the optimisation technique of SQL to the Map/Reduce based on query rewrite, will be original SQL is converted into query graph, and develops into diversified forms using rewriting rule, realize the application program of original storing process form to Auxiliary migration, verification of correctness and the performance optimization of cloud computing environment, can be greatly lowered relevant database and be applied to cloud The moving costs of calculating improves development efficiency, promotes the overall performance of parallel computation.
Fig. 3 is iForest rejecting outliers process.When carrying out rejecting outliers, each iTree need to be traversed, sample is sought Mean depth
Fig. 4 is iTree construction flow chart.When processing using isolated forest algorithm electricity consumption data, to acquisition Initial data want the cleanings of advanced row data, reject dirty data, delete redundant data, dimension-reduction treatment, iTree are carried out to data Construction process it is as shown in the figure.
Fig. 5 is the output of the construction and abnormal score of iForest.Electricity consumption data abnormality detection side is constructed using iForest The process of method determines whether user is abnormal user as shown in figure 5, according to the size of abnormal score.
Fig. 6 is this example iForest model inspection ROC curve figure.As can be seen from Figure 6 ROC curve very close to point (0, 1), show that the iForest category of model effect established based on this is ideal.
Fig. 7 is this example iForest model inspection P-R curve graph.As shown in fig. 7, P-R curve is very close to point (1,1), it can See that iForest algorithm is not only complete but also quasi- for the detection of a field data.
Table 1 is line loss Lei Tai area analysis statistical form of in the March, 2019 based on power information acquisition system:
As shown in Table 1, line loss per unit platform area up to standard is 694 at present, accounts for about the 99% of range of management platform area sum, acquisition Equipment coverage rate is low, is the main reason for influencing platform area control overall process.It is analyzed by further platform area detail, platform acquisition The main reason for coverage rate is low is that the non-resident acquisition equipment installation rate in most of platform area is low.After reason is investigated thoroughly, acquisition should be adjusted Equipment mount scheme.This between outer land area family becomes the inaccurate problem of relationship and comes second in influencing platform area control effect, adopts at 456 Collecting family change relationship in the platform area of coverage rate 100% accurately has 312, accuracy rate 68%, by becoming relationships not to 144 families The area Zhun Tai carries out investigation discovery, and main cause is first is that the part area Lao Tai, data loss;Second is that platform area operation in load occur compared with Change greatly but data changes not in time.Should be while reasonable arrangement acquire equipment installation, the concern family Tai Qu becomes the verification of relationship, It also assists carrying out live family change relationship verification using platform area client bidirectional recognition instrument.

Claims (4)

1. the power information based on isolated forest algorithm acquires data exception analysis method, it is characterised in that the following steps are included:
Step 1: establishing the platform area Controlling line loss index based on power information acquisition system, formulates based on power information acquisition system The area Tong Tai Controlling line loss method;
Step 2: being directed to the area line loss Lei Tai, realizes the acquisition of the area multiple line loss Lei Tai power information data using cloud storage technology, divides Class, processing;
Step 3: the type of analysis and summary dirty data eliminates noise according to its form of expression;
Step 4: by data transformation by the data after cleaning screening, the form conducive to data mining is converted to;
Step 5: the isolated forest algorithm of application establishes Data Analysis Model, and applies Receiver Operating Characteristics' ROC curve and curve Lower area AUC and accumulation recall curve and P-R curve, using precision ratio as the longitudinal axis, recall ratio is horizontal axis mapping, carries out model Assessment, and this model is applied on the area multiple line loss Lei Tai power information data set, the data after screening are counted According to excavation, multiplexing electric abnormality user is screened.
2. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1 Be: in step 1, the platform area Controlling line loss index of foundation includes covering class, family change class, can adopt class, data class, line loss class five Kind status indicator and its hierarchical relationship;
Covering class: equipment installation rate is acquired in platform area and is not up to 100%;
Family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate;
Can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%;
Data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired with manually copy Table data error is greater than mean value;
Line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and to become relationship correct at family, but line loss per unit is abnormal.
3. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1 It is: in step 2, using the distributed document memory mechanism of cloud storage, the dispersion of power information data is stored in more independences Storage server on, it includes volume management, metadata management, block data management service;
Metadata refers to title, attribute, the data block location information of file;
For block number according to multiple data blocks that file data is split to form according to a certain size are referred to, different storages is arrived in distribution storage On node server, the memory space as provided by a pair of of meta data server and its storage server node of management is known as one A volume space;
Storage platform is empty when volume management server is responsible for externally providing multiple volume virtualization integrations into unified whole access mysorethorn Between.
4. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1 It is: in step 5, isolates forest algorithm, be a two-phase algorithm:
First stage, the isolated forest of building t iTree composition, implementation step are as follows:
(1), ψ sample points are randomly choosed from training data as subsample collection, are put into the root node of tree;
(2), it is randomly assigned a dimension, a cut point P is randomly generated in present node data;
(3), a hyperplane is generated with this cut point, present node data space is divided into 2 sub-spaces: specified dimension In data less than P be placed on the left side of present node, the data more than or equal to p are placed on the right of present node;
(4), recursion step (1) and (2) in child node, constantly construct new child node, can not divide again until data itself or The depth of tree reaches log2ψ;
Second stage assesses test data with the iForest of generation, calculates abnormal score to detected sample;For any Data x enables it traverse each iTree, obtains depth of the x locating for iTree and the mean depth h locating for every iTree (x), to calculate the abnormal score of sample;The abnormal score for being detected sample x is defined as follows shown in formula:
Wherein: h (x) is the depth for being detected the node that sample x is retrieved in iTree;E (h (x)) is to all t iTree Take mean value;C (ψ) is the average path length of the binary search tree of ψ point building;H (k)=ln (k)+ζ, ζ are Euler's constant;
Observe the definition of abnormal score, it is known that: when E (h (x)) → 0, s → 1;When E (h (x)) → ψ -1, s → 0;As E (h (x)) →c(ψ),s→0.5;That is a possibility that a possibility that s (x) is closer to 1 expression abnormal data is high, and closer 0 expression is normal point It is relatively high.
CN201910399385.6A 2019-05-14 2019-05-14 Power information based on isolated forest algorithm acquires data exception analysis method Pending CN110189232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910399385.6A CN110189232A (en) 2019-05-14 2019-05-14 Power information based on isolated forest algorithm acquires data exception analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910399385.6A CN110189232A (en) 2019-05-14 2019-05-14 Power information based on isolated forest algorithm acquires data exception analysis method

Publications (1)

Publication Number Publication Date
CN110189232A true CN110189232A (en) 2019-08-30

Family

ID=67716233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910399385.6A Pending CN110189232A (en) 2019-05-14 2019-05-14 Power information based on isolated forest algorithm acquires data exception analysis method

Country Status (1)

Country Link
CN (1) CN110189232A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503570A (en) * 2019-07-16 2019-11-26 国网江苏省电力有限公司滨海县供电分公司 A kind of exception electricity consumption data detection method, system, equipment, storage medium
CN110888850A (en) * 2019-12-04 2020-03-17 国网山东省电力公司威海供电公司 Data quality detection method based on power Internet of things platform
CN111008662A (en) * 2019-12-04 2020-04-14 贵州电网有限责任公司 Online monitoring data anomaly analysis method for power transmission line
CN111160647A (en) * 2019-12-30 2020-05-15 第四范式(北京)技术有限公司 Money laundering behavior prediction method and device
CN111505433A (en) * 2020-04-10 2020-08-07 国网浙江余姚市供电有限公司 Low-voltage transformer area family variable relation error correction and phase identification method
CN111767951A (en) * 2020-06-29 2020-10-13 上海积成能源科技有限公司 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
CN111833172A (en) * 2020-05-25 2020-10-27 百维金科(上海)信息科技有限公司 Consumption credit fraud detection method and system based on isolated forest
CN111951116A (en) * 2020-08-26 2020-11-17 江苏云脑数据科技有限公司 Medical insurance anti-fraud monitoring and analyzing method and system based on unsupervised isolated point detection
WO2021105799A1 (en) * 2019-11-26 2021-06-03 International Business Machines Corporation Method for privacy preserving anomaly detection in iot
CN112990246A (en) * 2019-12-17 2021-06-18 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN113032774A (en) * 2019-12-25 2021-06-25 中移动信息技术有限公司 Training method, device and equipment of anomaly detection model and computer storage medium
CN113536050A (en) * 2021-07-06 2021-10-22 贵州电网有限责任公司 Distribution network monitoring system curve data query processing method
CN114495137A (en) * 2022-04-15 2022-05-13 深圳高灯计算机科技有限公司 Bill abnormity detection model generation method and bill abnormity detection method
CN114580467A (en) * 2022-02-22 2022-06-03 国网山东省电力公司信息通信公司 Power data anomaly detection method and system based on data enhancement and Tri-tracing
CN116911806A (en) * 2023-09-11 2023-10-20 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN117971625A (en) * 2024-03-27 2024-05-03 莱芜职业技术学院 Performance data intelligent monitoring system based on computer cloud platform
CN117971625B (en) * 2024-03-27 2024-06-07 莱芜职业技术学院 Performance data intelligent monitoring system based on computer cloud platform

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
宋振伟: "用电信息采集系统数据库的云存储设计", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
张敏: "基于用电信息采集系统的台区线损管理研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
张荣昌: "基于数据挖掘的用电数据异常的分析与研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
王在乾等: "基于时间序列分析的电力负荷数据预处理方法", 《科技创新与应用》 *
王立斌等: "一种用电信息采集系统异常电量数据的识别与修复方法", 《电力大数据》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503570A (en) * 2019-07-16 2019-11-26 国网江苏省电力有限公司滨海县供电分公司 A kind of exception electricity consumption data detection method, system, equipment, storage medium
WO2021105799A1 (en) * 2019-11-26 2021-06-03 International Business Machines Corporation Method for privacy preserving anomaly detection in iot
GB2605899A (en) * 2019-11-26 2022-10-19 Ibm Method for privacy preserving anomaly detection in IOT
CN110888850A (en) * 2019-12-04 2020-03-17 国网山东省电力公司威海供电公司 Data quality detection method based on power Internet of things platform
CN111008662A (en) * 2019-12-04 2020-04-14 贵州电网有限责任公司 Online monitoring data anomaly analysis method for power transmission line
CN111008662B (en) * 2019-12-04 2023-01-10 贵州电网有限责任公司 Online monitoring data anomaly analysis method for power transmission line
CN110888850B (en) * 2019-12-04 2023-07-21 国网山东省电力公司威海供电公司 Data quality detection method based on electric power Internet of things platform
CN112990246A (en) * 2019-12-17 2021-06-18 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN112990246B (en) * 2019-12-17 2022-09-09 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN113032774A (en) * 2019-12-25 2021-06-25 中移动信息技术有限公司 Training method, device and equipment of anomaly detection model and computer storage medium
CN113032774B (en) * 2019-12-25 2024-06-07 中移动信息技术有限公司 Training method, device and equipment of anomaly detection model and computer storage medium
CN111160647B (en) * 2019-12-30 2023-08-22 第四范式(北京)技术有限公司 Money laundering behavior prediction method and device
CN111160647A (en) * 2019-12-30 2020-05-15 第四范式(北京)技术有限公司 Money laundering behavior prediction method and device
CN111505433A (en) * 2020-04-10 2020-08-07 国网浙江余姚市供电有限公司 Low-voltage transformer area family variable relation error correction and phase identification method
CN111833172A (en) * 2020-05-25 2020-10-27 百维金科(上海)信息科技有限公司 Consumption credit fraud detection method and system based on isolated forest
CN111767951A (en) * 2020-06-29 2020-10-13 上海积成能源科技有限公司 Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
CN111951116A (en) * 2020-08-26 2020-11-17 江苏云脑数据科技有限公司 Medical insurance anti-fraud monitoring and analyzing method and system based on unsupervised isolated point detection
CN113536050A (en) * 2021-07-06 2021-10-22 贵州电网有限责任公司 Distribution network monitoring system curve data query processing method
CN113536050B (en) * 2021-07-06 2023-12-01 贵州电网有限责任公司 Distribution network monitoring system curve data query processing method
CN114580467A (en) * 2022-02-22 2022-06-03 国网山东省电力公司信息通信公司 Power data anomaly detection method and system based on data enhancement and Tri-tracing
CN114580467B (en) * 2022-02-22 2023-11-17 国网山东省电力公司信息通信公司 Power data anomaly detection method and system based on data enhancement and Tri-Training
CN114495137B (en) * 2022-04-15 2022-08-02 深圳高灯计算机科技有限公司 Bill abnormity detection model generation method and bill abnormity detection method
CN114495137A (en) * 2022-04-15 2022-05-13 深圳高灯计算机科技有限公司 Bill abnormity detection model generation method and bill abnormity detection method
CN116911806A (en) * 2023-09-11 2023-10-20 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN116911806B (en) * 2023-09-11 2023-11-28 湖北华中电力科技开发有限责任公司 Internet + based power enterprise energy information management system
CN117971625A (en) * 2024-03-27 2024-05-03 莱芜职业技术学院 Performance data intelligent monitoring system based on computer cloud platform
CN117971625B (en) * 2024-03-27 2024-06-07 莱芜职业技术学院 Performance data intelligent monitoring system based on computer cloud platform

Similar Documents

Publication Publication Date Title
CN110189232A (en) Power information based on isolated forest algorithm acquires data exception analysis method
CN110503570A (en) A kind of exception electricity consumption data detection method, system, equipment, storage medium
CN111639237B (en) Electric power communication network risk assessment system based on clustering and association rule mining
Kosman et al. Conservation prioritization based on trait‐based metrics illustrated with global parrot distributions
CN106095639A (en) A kind of cluster subhealth state method for early warning and system
CN111382897A (en) Transformer area low-voltage trip prediction method and device, computer equipment and storage medium
CN107133652A (en) Electricity customers Valuation Method and system based on K means clustering algorithms
CN108846555A (en) A kind of efficient accurate enthesis of electric load big data missing values
CN109242170A (en) A kind of City Road Management System and method based on data mining technology
CN107832876A (en) Subregion peak load Forecasting Methodology based on MapReduce frameworks
CN107766406A (en) A kind of track similarity join querying method searched for using time priority
CN114610706A (en) Electricity stealing detection method, system and device based on oversampling and improved random forest
CN114519514A (en) Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment
CN112988717B (en) Design and construction method of resident intelligent energy consumption service specimen library
CN114662909A (en) Rural land operation right circulation transaction price index calculation system
CN107862459B (en) Metering equipment state evaluation method and system based on big data
Wang et al. Stull: Unbiased online sampling for visual exploration of large spatiotemporal data
CN109146316A (en) Power marketing checking method, device and computer readable storage medium
Xu et al. Evaluation of fault level of sensitive equipment caused by voltage sag via data mining
Zhao et al. Hadoop-Based Power Grid Data Quality Verification and Monitoring Method
CN110020747A (en) A kind of analysis of Influential Factors method of load release characteristics
CN112529475B (en) Urban and rural collaborative development analysis method, device and storage medium
Li et al. Distribution transformer mid-term heavy load and overload pre-warning based on logistic regression
CN114662563A (en) Industrial electricity non-invasive load decomposition method based on gradient lifting algorithm
CN110175705B (en) Load prediction method and memory and system comprising same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190830