CN110189232A - Power information based on isolated forest algorithm acquires data exception analysis method - Google Patents
Power information based on isolated forest algorithm acquires data exception analysis method Download PDFInfo
- Publication number
- CN110189232A CN110189232A CN201910399385.6A CN201910399385A CN110189232A CN 110189232 A CN110189232 A CN 110189232A CN 201910399385 A CN201910399385 A CN 201910399385A CN 110189232 A CN110189232 A CN 110189232A
- Authority
- CN
- China
- Prior art keywords
- data
- line loss
- power information
- class
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 27
- 238000004458 analytical method Methods 0.000 title claims abstract description 25
- 230000002159 abnormal effect Effects 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000003860 storage Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000007418 data mining Methods 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000005516 engineering process Methods 0.000 claims abstract description 6
- 230000005856 abnormality Effects 0.000 claims abstract description 5
- 238000007405 data analysis Methods 0.000 claims abstract description 5
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 238000009825 accumulation Methods 0.000 claims abstract description 3
- 238000013501 data transformation Methods 0.000 claims abstract description 3
- 238000007726 management method Methods 0.000 claims description 21
- 230000008859 change Effects 0.000 claims description 13
- 238000009434 installation Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 235000014161 Caesalpinia gilliesii Nutrition 0.000 claims description 3
- 244000003240 Caesalpinia gilliesii Species 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000009412 basement excavation Methods 0.000 claims description 2
- 238000013523 data management Methods 0.000 claims description 2
- 239000006185 dispersion Substances 0.000 claims description 2
- 230000010354 integration Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000012549 training Methods 0.000 claims description 2
- 230000014759 maintenance of location Effects 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 5
- 238000005065 mining Methods 0.000 abstract 1
- 230000005611 electricity Effects 0.000 description 20
- 238000010276 construction Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 230000003247 decreasing effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000007689 inspection Methods 0.000 description 4
- 230000000750 progressive effect Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000012925 reference material Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 235000006508 Nelumbo nucifera Nutrition 0.000 description 1
- 240000002853 Nelumbo nucifera Species 0.000 description 1
- 235000006510 Nelumbo pentapetala Nutrition 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
Power information based on isolated forest algorithm acquires data exception analysis method, establishes the platform area Controlling line loss index based on power information acquisition system, formulates the platform area Controlling line loss method based on power information acquisition system.For the area line loss Lei Tai, the acquisition of the area multiple line loss Lei Tai power information data, classification, processing are realized using cloud storage technology;The type of analysis and summary dirty data eliminates noise according to its form of expression, removes dirty data;By data transformation by the data after cleaning screening, the form conducive to data mining is converted to;Data Analysis Model is established using isolated forest algorithm, and using Receiver Operating Characteristics' ROC curve and area under the curve AUC and accumulation recall curve and P-R curve, carry out model evaluation, and this model is applied on the area multiple line loss Lei Tai power information data set, data mining is carried out to the data after screening, screens multiplexing electric abnormality user.The present invention analyzes line loss reason using the isolated effective mining data abnormal user of forest algorithm, reinforces platform area Controlling line loss.
Description
Technical field
The present invention relates to power information acquisition technique field, specifically a kind of power information based on isolated forest algorithm is adopted
Collect data exception analysis method.
Background technique
With the rapid development of information age, take the lead in big data correlative study is unfolded is internet, information communication row
Industry.For power industry, big data similarly has far-reaching research significance and bright application prospect.With next-generation electricity
Force system gradually evolution will gradually replace traditional power supply chain based on the power supply chain of data-driven.Wherein use telecommunications
The popularization of acquisition system is ceased, carries out the managed operation decision analyzed based on electric power data for China's power industry and electric service is excellent
Change provides necessary data basis.Simultaneously as the electricity consumption datas such as energy data, floor data, event information exponentially increase
Long, big data feature is more and more significant, and the application demand of electricity consumption big data is increasingly urgent to.The electricity consumption data of magnanimity is mainly derived from
All kinds of metering devices and system occur a large amount of due to plurality of devices failure, communication failure, power network fluctuation and management etc.
Abnormal electricity consumption data.In face of the increase of this magnanimity electricity consumption data, most power departments be used only traditional statistical method into
The analysis of row abnormal data, and need to rely on field test mostly to realize.It is different due to being limited by human and material resources, financial resources
The profound cause that regular data is hidden behind can not be refined effectively, and " data disaster " and " data are fallen into disuse " are but brought.Cause
This, with traditional analysis means, oneself is difficult to meet the requirements, and needs to find that electricity consumption data is extremely deeper by data mining
Rule excludes the contingency of data, refines the certainty of data.
Since low-voltage client Population is huge, and change frequently, generally existing family, which becomes, in the Controlling line loss of mesh foreground partition closes
It is the exception of line loss caused by the administrative reasons such as unclear, meter reading quality is bad, stealing, metering fault.In recent years, domestic many power supplies
Enterprise it is different degrees of face a common awkward situation, i.e., on administering platform area line loss " investment is big, return is small ", root is close
Since 10 years, the principal element for influencing platform area line loss has turned to managerial loss, and investment for trnasforming urban land direction is constant.Electricity consumption
Intelligentized management philosophy is introduced platform area by information acquisition system construction, brings the opportunity of innovation to platform area Controlling line loss.
Cloud computing technology can provide the high quality clothes of distribution according to need by utilizing distributed software and hardware resources and information
Business, and be successfully applied in the various fields such as search engine, social networks, communication.It is built in smart grid informationization
If field, the unique large-scale data of cloud computing efficiently accesses and computation capability, enable be include power information
Information system including acquisition system provides the data processing service of high quality, and the Information System for the smart grid epoch provides
Solid technical support.
Summary of the invention
The present invention provides a kind of power information acquisition data exception analysis method based on isolated forest algorithm, and realization is based on
The application continuum that storage platform technology blends when real-time data base and cloud computing, mysorethorn, uses efficient parallel computation skill
The high-throughput of art realization big data batch processing task.Good using stability, the strong isolated forest algorithm of noiseproof feature is effectively dug
Data exception user is dug, line loss reason is analyzed, reinforces platform area Controlling line loss.
The technical scheme adopted by the invention is as follows:
Power information based on isolated forest algorithm acquires data exception analysis method, comprising the following steps:
Step 1: establishing the platform area Controlling line loss index based on power information acquisition system, and formulation is adopted based on power information
The platform area Controlling line loss method of collecting system.
Step 2: being directed to the area line loss Lei Tai, realizes that the area multiple line loss Lei Tai power information data are adopted using cloud storage technology
Collection, classification, processing;
Step 3: the type of analysis and summary dirty data eliminates noise according to its form of expression;
Step 4: by data transformation by the data after cleaning screening, the form conducive to data mining is converted to, i.e.,
Dimensionality reduction is carried out to data.For the similitude sufficiently reflected between load, the present invention chooses 6 kinds of common daily load characteristic index: negative
Lotus rate, peak-valley ratio, highest utilize hour rate, peak phase load factor, flat phase load factor, paddy phase load factor, are comprehensively reflected each
The use electrical characteristics of class user, logarithm factually existing effect dimensionality reduction.
Step 5: the isolated forest algorithm of application establishes Data Analysis Model, and application Receiver Operating Characteristics' ROC curve with
Area under the curve AUC, P-R curve carries out model evaluation, and this model is applied to the area multiple line loss Lei Tai power information data
On collection, data mining is carried out to the data after screening, screens multiplexing electric abnormality user.
Receiver Operating Characteristics' ROC curve: when the changes in distribution of the positive negative sample in test set, ROC curve can be protected
It holds constant.For the serial number of binary classification model output, the sample that will be greater than threshold value divides positive class into, less than the sample of threshold value
Then divide negative class into.Reducing threshold values no doubt can recognize that more positive classes, that is, also can be by more negative samples when improving recall ratio
It divides positive class into, that is, improves rate of false alarm.ROC curve visualizes this change procedure.In ROC space coordinate, point (0,1) is indicated
Ideal sort device, ROC curve are better closer to point (0,1) presentation class effect.The numerical value of AUC is exactly ROC curve section below
The size of area, AUC=1 corresponding ideal classifier, AUC=0.5 are represented as random guess, and model is not previously predicted value,
It is represented between 0.5 to 1 and is better than random guess.
P-R curve: being horizontal axis mapping by the longitudinal axis, recall ratio of precision ratio, just obtain the curve of precision ratio and recall ratio, letter
Referred to as " P-R curve " changes from big to small with classification thresholds, and precision ratio reduces, and recall ratio increases, when classification of assessment device, P-R
Curve is better closer to point (1,1) presentation class effect.
In step 1, the platform area Controlling line loss index of foundation includes covering class, family change class, can adopt class, data class, line loss class
Five kinds of status indicators and its hierarchical relationship;Following status indicator, needle are carried out to the area Duo Tai according to collected power load data
Corresponding management and control measures are formulated to the control emphasis in different type platform area, to form the platform area based on power information acquisition system
Controlling line loss method, concrete measure are as follows:
Covering class: equipment installation rate is acquired in platform area and is not up to 100%, reasonable arrangement is answered to acquire equipment erection schedule;
Family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate, should be consulted reference materials by interior
The mode of scene combination is looked into outside, is checked and approved family and is become relationship;
Can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%, the rate that can adopt, analysis leakage should be counted
The reason of adopting, accidentally adopting;
Data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired and people
Work meter reading data error is greater than mean value, formulates reasonable meter reading plan;
Line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and to become relationship correct at family, but line loss per unit is abnormal,
Line loss per unit abnormal cause should be analyzed in time, formulate reducing loss measure;
In step 2, using the distributed document memory mechanism of cloud storage, the dispersion of power information data is stored in more
In independent storage server, it includes volume management, metadata management, block data management service;
Metadata refers to title, attribute, the data block location information of file, and because metadata access is frequent, therefore system will be first
Data load caching is managed into memory, improves access efficiency.
Block number according to multiple data blocks that file data is split to form according to a certain size are referred to, arrive different by distribution storage
On memory node server, the memory space as provided by a pair of of meta data server and its storage server node of management claims
For a volume space;
Volume management server is responsible for multiple volume virtualization integrations, and it is flat externally to provide unified whole access cloud real-time storage
Platform space.
Storage platform system uses parallel ETL (Extraction-Transformation-Loading) environment when mysorethorn,
Original computation-intensive complex task, atomicity decomposition is carried out, is assigned on different task processing nodes, carries out concurrent same
Step processing, improves data-handling efficiency and data handling capacity, guarantees data processing performance.
In step 3, common dirty data type has:
(1) missing values: being null value in table
(2) repetition values: a certain moment power load Data duplication of user
(3) Min-max: power load data are excessive or too small
(4) it load burr: is increased or reduced suddenly between adjacent time interval data
(5) negative value is impacted: readings decline in certain continuous period
According to characteristic period of waves of power load, vacancy value is filled, calculation method is as follows:
In formula, XiIndicate the power load at current time, at the time of i is load shortage of data, value 1-24, α1And α2
The weighting coefficient of former and later two time point loads of two days corresponding moment and current time before and after table
Noise data is repaired using Rectangular Method, calculation method is as follows:
In formula, XiFor electricity reparation value, F is intraday load data times of collection, PiFor the load data at i moment, △
T is load data collection interval.
In step 5, outlier is excavated using isolated forest algorithm, helps to lock user's exception suspicion automatically, is realized different
The preliminary screening at common family, improves investigation rate.The algorithm by with a random hyperplane come cutting data space, every cutting
Two sub-spaces once can be generated, be further continued for cutting every sub-spaces with a random hyperplane later, circulation is gone down, directly
Until only one data point inside every sub-spaces.
Isolated forest algorithm, is a two-phase algorithm:
First stage, the isolated forest of building t iTree composition, implementation step are as follows:
(1), ψ sample points are randomly choosed from training data as subsample collection, are put into the root node of tree.
(2), it is randomly assigned a dimension, a cut point P is randomly generated in present node data.
(3), a hyperplane is generated with this cut point, present node data space is divided into 2 sub-spaces: specified
Data in dimension less than P are placed on the left side of present node, the data more than or equal to p are placed on the right of present node.
(4), recursion step (1) and (2) in child node, constantly construct new child node, until data itself can not be again
Divide or the depth of tree reaches log2ψ。
Second stage assesses test data with the iForest of generation, calculates abnormal score to detected sample.For
Any data x enables it traverse each iTree, obtains depth of the x locating for iTree and the average depth locating for every iTree
It spends h (x), to calculate the abnormal score of sample.The abnormal score for being detected sample x is defined as follows shown in formula:
Wherein: h (x) is the depth for being detected the node that sample x is retrieved in iTree;E (h (x)) is to all t
ITree takes mean value;C (ψ) is the average path length of the binary search tree of ψ point building;H (k)=ln (k)+ζ, ζ are that Euler is normal
Number.
Observe the definition of abnormal score, it is known that: when E (h (x)) → 0, s → 1;When E (h (x)) → ψ -1, s → 0;As E (h
(x))→c(ψ),s→0.5.That is s (x) closer to 1 indicate abnormal data a possibility that it is high, closer to 0 indicate be normal point can
Energy property is relatively high.
A kind of power information based on isolated forest algorithm of the present invention acquires data exception analysis method, and technical effect is such as
Under:
1: the platform area Controlling line loss method based on power information acquisition system, as a kind of new adaptation smart grid development
The problem of platform Controlling line loss method of demand, effective solution current unit-area management, the area Shi Tai Controlling line loss is more
It is transparent, efficient, its integrated management effect in marketing management is played, finally realizes the target of saving energy and decreasing loss, standardized administration.
2: there are five types of states altogether for the platform area Controlling line loss index system of foundation: covering class, family become class, can adopt class, data class,
Five kinds of status indicators of line loss class and its hierarchical relationship.The platform area of different conditions is formulated different according to different control emphasis
Management-control method, control period and responsible department, it is final that platform area is pushed to realize that good state is progressive.
3: the present invention is sufficiently combined power information acquisition system with platform area line loss analyzing platform, effective solution line loss
The problem of management.Data mining research and analysis can be carried out extremely for the line loss of platform area electricity system, make line loss
Management is more transparent, efficient, can play its integrated management application, finally realizes the target of saving energy and decreasing loss, standardized management.
Detailed description of the invention
Fig. 1 is the platform area line loss control progressive schematic diagram of index system state based on power information acquisition system.
Fig. 2 is power information acquisition system general frame schematic diagram.
Fig. 3 is iForest rejecting outliers procedure chart.
Fig. 4 is iTree construction flow chart.
Fig. 5 is the construction of iForest and the output flow chart of abnormal score.
Fig. 6 is iForest model inspection ROC curve figure.
Fig. 7 is iForest model inspection P-R curve graph.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and embodiments.
By taking certain Utilities Electric Co., county as an example, the platform area Controlling line loss instance analysis based on power information acquisition system is carried out.
Use Enterprise SOA system as overall design philosophy frame in the exploitation design of the system.It mainly includes
3 functional modules below: data acquisition and real time data library module, abnormal data analysis module, decreasing loss aid decision module.
Data acquisition uses the special transformer terminals of 05 edition specification with real time data library module, can be every 15 minutes acquisition users
The voltage and current and electricity data of electric energy meter (24 hours totally 96 points), i.e. data set S are the n*24 rank that n daily load curve is constituted
Initial load curve matrix.Collected mass data is realized distributed storage by cloud storage technology by the module.Data warp
It is obtained after processing: in September, 2018 to shared 701, platform area, in March, 2019 Utilities Electric Co., county, 34.9 ten thousand KVA of platform area total capacity,
Average individual capacity 497.8KVA, 4.6 ten thousand Kwh of accumulating losses electricity, average platform area line loss per unit 2.69%.
The isolated forest algorithm of abnormal data analysis module application excavates outlier, to find out multiplexing electric abnormality user.The mould
Block is broadly divided into 4 stages:
(1), collected electricity consumption data is cleaned, the type of analysis and summary dirty data, further according to its form of expression
Targetedly means are taken, the redundant data in data set is deleted, keeps the integrality of data set.For lacking serious data
According to characteristic period of waves of power load, load and the front and back at current time of the adjacent point of same time on the two in front and back are calculated
The mean value of the load at two time points and latter day with respect to the load changing rate method of proxima luce (prox. luc), add load variations amount with mean value
Vacancy value is filled, calculation method is as follows:
In formula, XiIndicate the power load at current time, at the time of i is load shortage of data, value 1-24, α1And α2
Indicate the weighting coefficient of former and later two time point loads of two days corresponding moment and current time of front and back.It counts for abnormal noise
According to the load data for respectively acquiring the moment to the same day using Rectangular Method carries out the reparation value of integral calculation electricity, and calculation formula is such as
Shown in lower:
In formula, XiFor electricity reparation value, F is intraday load data times of collection, PiFor the load data at i moment, △
T is load data collection interval, and this example takes △ T=15min.
(2), Data Dimensionality Reduction: for the load curve as time series, power load data are vulnerable to temperature, receipts
Enter, many factors such as electrovalence policy influence, these influence the internal characteristics of result as time series data, can not be obtained by distance
Sufficiently reflection, cannot be completely secured the form of time series or the similitude of profile.Also, it is this kind of for daily load curve have it is bright
The curve of aobvious load shape, undesirable isometry can be shown in higher-dimension.For the similitude sufficiently reflected between load,
Operation efficiency is taken into account, the present embodiment has chosen 6 kinds of common daily load characteristic index: rate of load condensate, peak-valley ratio, highest utilize small
When rate, peak phase load factor, flat phase load factor, paddy phase load factor, from 4 whole day, peak phase, flat phase, Gu Qi angles, more comprehensively
Reflect all types of user uses electrical characteristics.Feature Dimension Reduction is carried out to Payload curve matrix using 6 daily load characteristic index.
(3), iTree is constructed, process is as follows:
1., in 6 daily load characteristic index randomly choose a feature;
2., random selection this feature a value k;
3., classified to every record according to feature, the record that k is less than in feature is placed on left branch, greater than etc.
Right branch is placed in the record of k;
4., then recurrence Construction left branch and right branch, until meeting the following conditions:
A, incoming data set only has a record or a plurality of the same record;
B, the height of tree has reached restriction height
Since abnormal data wants much less relative to normal data, and with the feature of normal data compared to significantly different,
Its path length is also relatively low, therefore other than limited samples size, also depth capacity h=is arranged to every iTree
log2ψ, it is only necessary to which concern is lower than the part of mean depth, and such efficiency of algorithm is higher.Take herein the hits ψ of i-Tree=
100, i-Tree quantity n=100.
(4), it constructs isolated forest to carry out model evaluation and calculate data exception score value: there is otherness by several
ITree constitutes iForest, and carries out model evaluation with ROC curve and AUC and accumulation recall curve and P-R curve.
IForest every time can only evaluate single user.All iTree are needed to be traversed in each evaluation procedure.Statistical query pair
As the position of the leaf node fallen in, abnormal score is calculated by its average path length.Finally according to the size of abnormal score
User is evaluated, judges whether user to be measured is abnormal user.
Decreasing loss aid decision module mainly includes that decreasing loss decision support function and decreasing loss scheme base manage two parts, should
Module is checked for electricity consumption data abnormal user, pays close attention to the following contents: with the presence or absence of electricity stealing in a, platform area;b,
The variation of platform area load operation, whether there is or not cut to change;C, platform area transformer whether be lightly loaded or heavy duty;D, reactive-load compensation equipment operating condition;
E, whether three-phase load balances;F, quality of voltage;G, whether transformer, route, measuring equipment are reasonable, normal;Cause electric energy meter
The reason of amount device exception mainly has meter failure, mulual inductor malfunction, terminal box failure and terminal fault etc. h, low pressure to supply
Whether electric radius is too long;I, other reasons cause line loss abnormal.
Fig. 1 is that the platform area line loss based on power information acquisition system manages the progressive schematic diagram of index system state.
The initial platform area 1.1-: data preparation is included in the progressive management of level.
1.2- installs the area Lei Tai: the installation of reasonable arrangement equipment, coverage rate reach 100%.
The area 1.3- Hu Bianleitai: it verifies the family Tai Qu and becomes relationship, accuracy rate reaches 100%.
1.4- can the area Cai Leitai: multi collect analyzes failure, can the rate of adopting reach 95%.
1.5- data class platform area: multi collect, analytical error, can the rate of adopting reach 95%.
The area 1.6- line loss Lei Tai: analysis line loss per unit abnormal cause formulates reducing loss measure.
The up to standard area 1.7-: taking admittedly excellent measure, keeps mesa-shaped state up to standard.
Status indicator is carried out with platform area control emphasis, shares following five kinds of states:
A, it covers class: acquiring equipment installation rate in platform area and be not up to 100%, control emphasis is reasonable arrangement acquisition equipment peace
Dress plan accelerates dress swap-in degree, improves coverage rate.
B, family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate, and control emphasis is knot
Close can adopt situation analysis family become relationship confusion range, it is interior consult reference materials outside look into scene combine by way of, check and approve family change relationship.
C, can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%, control emphasis is that statistics can
Rate is adopted, the reason of adopting, accidentally adopting is leaked in analysis, and raising of adopting an effective measure can adopt rate.
D, data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired with
Manual metering data error is greater than mean value, and control emphasis is to formulate reasonable meter reading plan, and multiple meter reading, repeatedly verification is manually copied
The data of table and acquisition meter reading are analyzed exception and are rectified and improved.
E, line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and family becomes that relationship is correct, but line loss per unit is different
Often, control emphasis is analysis line loss per unit abnormal cause, finds target making reducing loss measure accurately.
Fig. 2 is power information acquisition system general frame.Acquisition cluster periodically acquires information from user terminal, and
By calling memory interface to store data into cloud storage and inquiry environment;Data storage and inquiry environment are responsible to collecting
Information carry out high concurrent storage, and upwards provide electricity consumption data index and efficient query function.Parallel ETL environment is responsible for original
There is the data exchange of archive information and cloud computing environment in relevant database;Tables of data is established using ETL management tool to map
The implementation strategy of relationship and task, system carry out real-time tracking to the data in interconnected system by parallel ETL tool, obtain
And consistency desired result.Parallel parsing and calculating environment are responsible for running isolated forest algorithm excavation abnormal data.Front end interface includes
Class SQL (Structured Query Language) interface, Web service, client packet etc., facing external system provide inquiry
With the service of analytical calculation.Mapping tool uses the optimisation technique of SQL to the Map/Reduce based on query rewrite, will be original
SQL is converted into query graph, and develops into diversified forms using rewriting rule, realize the application program of original storing process form to
Auxiliary migration, verification of correctness and the performance optimization of cloud computing environment, can be greatly lowered relevant database and be applied to cloud
The moving costs of calculating improves development efficiency, promotes the overall performance of parallel computation.
Fig. 3 is iForest rejecting outliers process.When carrying out rejecting outliers, each iTree need to be traversed, sample is sought
Mean depth
Fig. 4 is iTree construction flow chart.When processing using isolated forest algorithm electricity consumption data, to acquisition
Initial data want the cleanings of advanced row data, reject dirty data, delete redundant data, dimension-reduction treatment, iTree are carried out to data
Construction process it is as shown in the figure.
Fig. 5 is the output of the construction and abnormal score of iForest.Electricity consumption data abnormality detection side is constructed using iForest
The process of method determines whether user is abnormal user as shown in figure 5, according to the size of abnormal score.
Fig. 6 is this example iForest model inspection ROC curve figure.As can be seen from Figure 6 ROC curve very close to point (0,
1), show that the iForest category of model effect established based on this is ideal.
Fig. 7 is this example iForest model inspection P-R curve graph.As shown in fig. 7, P-R curve is very close to point (1,1), it can
See that iForest algorithm is not only complete but also quasi- for the detection of a field data.
Table 1 is line loss Lei Tai area analysis statistical form of in the March, 2019 based on power information acquisition system:
As shown in Table 1, line loss per unit platform area up to standard is 694 at present, accounts for about the 99% of range of management platform area sum, acquisition
Equipment coverage rate is low, is the main reason for influencing platform area control overall process.It is analyzed by further platform area detail, platform acquisition
The main reason for coverage rate is low is that the non-resident acquisition equipment installation rate in most of platform area is low.After reason is investigated thoroughly, acquisition should be adjusted
Equipment mount scheme.This between outer land area family becomes the inaccurate problem of relationship and comes second in influencing platform area control effect, adopts at 456
Collecting family change relationship in the platform area of coverage rate 100% accurately has 312, accuracy rate 68%, by becoming relationships not to 144 families
The area Zhun Tai carries out investigation discovery, and main cause is first is that the part area Lao Tai, data loss;Second is that platform area operation in load occur compared with
Change greatly but data changes not in time.Should be while reasonable arrangement acquire equipment installation, the concern family Tai Qu becomes the verification of relationship,
It also assists carrying out live family change relationship verification using platform area client bidirectional recognition instrument.
Claims (4)
1. the power information based on isolated forest algorithm acquires data exception analysis method, it is characterised in that the following steps are included:
Step 1: establishing the platform area Controlling line loss index based on power information acquisition system, formulates based on power information acquisition system
The area Tong Tai Controlling line loss method;
Step 2: being directed to the area line loss Lei Tai, realizes the acquisition of the area multiple line loss Lei Tai power information data using cloud storage technology, divides
Class, processing;
Step 3: the type of analysis and summary dirty data eliminates noise according to its form of expression;
Step 4: by data transformation by the data after cleaning screening, the form conducive to data mining is converted to;
Step 5: the isolated forest algorithm of application establishes Data Analysis Model, and applies Receiver Operating Characteristics' ROC curve and curve
Lower area AUC and accumulation recall curve and P-R curve, using precision ratio as the longitudinal axis, recall ratio is horizontal axis mapping, carries out model
Assessment, and this model is applied on the area multiple line loss Lei Tai power information data set, the data after screening are counted
According to excavation, multiplexing electric abnormality user is screened.
2. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1
Be: in step 1, the platform area Controlling line loss index of foundation includes covering class, family change class, can adopt class, data class, line loss class five
Kind status indicator and its hierarchical relationship;
Covering class: equipment installation rate is acquired in platform area and is not up to 100%;
Family becomes class: acquisition coverage rate has reached 100% platform area, but family change relationship is still inaccurate;
Can adopt class: oneself reaches 100% to acquisition coverage rate, but can the rate of adopting have not yet been reached 95%;
Data class: coverage rate reaches 100%, can the rate of adopting reach 95% and family becomes that relationship is correct, but the data acquired with manually copy
Table data error is greater than mean value;
Line loss class: coverage rate can adopt that rate, accuracy rate have reached 100% and to become relationship correct at family, but line loss per unit is abnormal.
3. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1
It is: in step 2, using the distributed document memory mechanism of cloud storage, the dispersion of power information data is stored in more independences
Storage server on, it includes volume management, metadata management, block data management service;
Metadata refers to title, attribute, the data block location information of file;
For block number according to multiple data blocks that file data is split to form according to a certain size are referred to, different storages is arrived in distribution storage
On node server, the memory space as provided by a pair of of meta data server and its storage server node of management is known as one
A volume space;
Storage platform is empty when volume management server is responsible for externally providing multiple volume virtualization integrations into unified whole access mysorethorn
Between.
4. the power information based on isolated forest algorithm acquires data exception analysis method, feature according to claim 1
It is: in step 5, isolates forest algorithm, be a two-phase algorithm:
First stage, the isolated forest of building t iTree composition, implementation step are as follows:
(1), ψ sample points are randomly choosed from training data as subsample collection, are put into the root node of tree;
(2), it is randomly assigned a dimension, a cut point P is randomly generated in present node data;
(3), a hyperplane is generated with this cut point, present node data space is divided into 2 sub-spaces: specified dimension
In data less than P be placed on the left side of present node, the data more than or equal to p are placed on the right of present node;
(4), recursion step (1) and (2) in child node, constantly construct new child node, can not divide again until data itself or
The depth of tree reaches log2ψ;
Second stage assesses test data with the iForest of generation, calculates abnormal score to detected sample;For any
Data x enables it traverse each iTree, obtains depth of the x locating for iTree and the mean depth h locating for every iTree
(x), to calculate the abnormal score of sample;The abnormal score for being detected sample x is defined as follows shown in formula:
Wherein: h (x) is the depth for being detected the node that sample x is retrieved in iTree;E (h (x)) is to all t iTree
Take mean value;C (ψ) is the average path length of the binary search tree of ψ point building;H (k)=ln (k)+ζ, ζ are Euler's constant;
Observe the definition of abnormal score, it is known that: when E (h (x)) → 0, s → 1;When E (h (x)) → ψ -1, s → 0;As E (h (x))
→c(ψ),s→0.5;That is a possibility that a possibility that s (x) is closer to 1 expression abnormal data is high, and closer 0 expression is normal point
It is relatively high.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910399385.6A CN110189232A (en) | 2019-05-14 | 2019-05-14 | Power information based on isolated forest algorithm acquires data exception analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910399385.6A CN110189232A (en) | 2019-05-14 | 2019-05-14 | Power information based on isolated forest algorithm acquires data exception analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110189232A true CN110189232A (en) | 2019-08-30 |
Family
ID=67716233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910399385.6A Pending CN110189232A (en) | 2019-05-14 | 2019-05-14 | Power information based on isolated forest algorithm acquires data exception analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110189232A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503570A (en) * | 2019-07-16 | 2019-11-26 | 国网江苏省电力有限公司滨海县供电分公司 | A kind of exception electricity consumption data detection method, system, equipment, storage medium |
CN110888850A (en) * | 2019-12-04 | 2020-03-17 | 国网山东省电力公司威海供电公司 | Data quality detection method based on power Internet of things platform |
CN111008662A (en) * | 2019-12-04 | 2020-04-14 | 贵州电网有限责任公司 | Online monitoring data anomaly analysis method for power transmission line |
CN111160647A (en) * | 2019-12-30 | 2020-05-15 | 第四范式(北京)技术有限公司 | Money laundering behavior prediction method and device |
CN111505433A (en) * | 2020-04-10 | 2020-08-07 | 国网浙江余姚市供电有限公司 | Low-voltage transformer area family variable relation error correction and phase identification method |
CN111767951A (en) * | 2020-06-29 | 2020-10-13 | 上海积成能源科技有限公司 | Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis |
CN111833172A (en) * | 2020-05-25 | 2020-10-27 | 百维金科(上海)信息科技有限公司 | Consumption credit fraud detection method and system based on isolated forest |
CN111951116A (en) * | 2020-08-26 | 2020-11-17 | 江苏云脑数据科技有限公司 | Medical insurance anti-fraud monitoring and analyzing method and system based on unsupervised isolated point detection |
WO2021105799A1 (en) * | 2019-11-26 | 2021-06-03 | International Business Machines Corporation | Method for privacy preserving anomaly detection in iot |
CN112990246A (en) * | 2019-12-17 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
CN113032774A (en) * | 2019-12-25 | 2021-06-25 | 中移动信息技术有限公司 | Training method, device and equipment of anomaly detection model and computer storage medium |
CN113536050A (en) * | 2021-07-06 | 2021-10-22 | 贵州电网有限责任公司 | Distribution network monitoring system curve data query processing method |
CN114495137A (en) * | 2022-04-15 | 2022-05-13 | 深圳高灯计算机科技有限公司 | Bill abnormity detection model generation method and bill abnormity detection method |
CN114580467A (en) * | 2022-02-22 | 2022-06-03 | 国网山东省电力公司信息通信公司 | Power data anomaly detection method and system based on data enhancement and Tri-tracing |
CN116911806A (en) * | 2023-09-11 | 2023-10-20 | 湖北华中电力科技开发有限责任公司 | Internet + based power enterprise energy information management system |
CN117971625A (en) * | 2024-03-27 | 2024-05-03 | 莱芜职业技术学院 | Performance data intelligent monitoring system based on computer cloud platform |
CN117971625B (en) * | 2024-03-27 | 2024-06-07 | 莱芜职业技术学院 | Performance data intelligent monitoring system based on computer cloud platform |
-
2019
- 2019-05-14 CN CN201910399385.6A patent/CN110189232A/en active Pending
Non-Patent Citations (5)
Title |
---|
宋振伟: "用电信息采集系统数据库的云存储设计", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
张敏: "基于用电信息采集系统的台区线损管理研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
张荣昌: "基于数据挖掘的用电数据异常的分析与研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
王在乾等: "基于时间序列分析的电力负荷数据预处理方法", 《科技创新与应用》 * |
王立斌等: "一种用电信息采集系统异常电量数据的识别与修复方法", 《电力大数据》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503570A (en) * | 2019-07-16 | 2019-11-26 | 国网江苏省电力有限公司滨海县供电分公司 | A kind of exception electricity consumption data detection method, system, equipment, storage medium |
WO2021105799A1 (en) * | 2019-11-26 | 2021-06-03 | International Business Machines Corporation | Method for privacy preserving anomaly detection in iot |
GB2605899A (en) * | 2019-11-26 | 2022-10-19 | Ibm | Method for privacy preserving anomaly detection in IOT |
CN110888850A (en) * | 2019-12-04 | 2020-03-17 | 国网山东省电力公司威海供电公司 | Data quality detection method based on power Internet of things platform |
CN111008662A (en) * | 2019-12-04 | 2020-04-14 | 贵州电网有限责任公司 | Online monitoring data anomaly analysis method for power transmission line |
CN111008662B (en) * | 2019-12-04 | 2023-01-10 | 贵州电网有限责任公司 | Online monitoring data anomaly analysis method for power transmission line |
CN110888850B (en) * | 2019-12-04 | 2023-07-21 | 国网山东省电力公司威海供电公司 | Data quality detection method based on electric power Internet of things platform |
CN112990246A (en) * | 2019-12-17 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
CN112990246B (en) * | 2019-12-17 | 2022-09-09 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
CN113032774A (en) * | 2019-12-25 | 2021-06-25 | 中移动信息技术有限公司 | Training method, device and equipment of anomaly detection model and computer storage medium |
CN113032774B (en) * | 2019-12-25 | 2024-06-07 | 中移动信息技术有限公司 | Training method, device and equipment of anomaly detection model and computer storage medium |
CN111160647B (en) * | 2019-12-30 | 2023-08-22 | 第四范式(北京)技术有限公司 | Money laundering behavior prediction method and device |
CN111160647A (en) * | 2019-12-30 | 2020-05-15 | 第四范式(北京)技术有限公司 | Money laundering behavior prediction method and device |
CN111505433A (en) * | 2020-04-10 | 2020-08-07 | 国网浙江余姚市供电有限公司 | Low-voltage transformer area family variable relation error correction and phase identification method |
CN111833172A (en) * | 2020-05-25 | 2020-10-27 | 百维金科(上海)信息科技有限公司 | Consumption credit fraud detection method and system based on isolated forest |
CN111767951A (en) * | 2020-06-29 | 2020-10-13 | 上海积成能源科技有限公司 | Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis |
CN111951116A (en) * | 2020-08-26 | 2020-11-17 | 江苏云脑数据科技有限公司 | Medical insurance anti-fraud monitoring and analyzing method and system based on unsupervised isolated point detection |
CN113536050A (en) * | 2021-07-06 | 2021-10-22 | 贵州电网有限责任公司 | Distribution network monitoring system curve data query processing method |
CN113536050B (en) * | 2021-07-06 | 2023-12-01 | 贵州电网有限责任公司 | Distribution network monitoring system curve data query processing method |
CN114580467A (en) * | 2022-02-22 | 2022-06-03 | 国网山东省电力公司信息通信公司 | Power data anomaly detection method and system based on data enhancement and Tri-tracing |
CN114580467B (en) * | 2022-02-22 | 2023-11-17 | 国网山东省电力公司信息通信公司 | Power data anomaly detection method and system based on data enhancement and Tri-Training |
CN114495137B (en) * | 2022-04-15 | 2022-08-02 | 深圳高灯计算机科技有限公司 | Bill abnormity detection model generation method and bill abnormity detection method |
CN114495137A (en) * | 2022-04-15 | 2022-05-13 | 深圳高灯计算机科技有限公司 | Bill abnormity detection model generation method and bill abnormity detection method |
CN116911806A (en) * | 2023-09-11 | 2023-10-20 | 湖北华中电力科技开发有限责任公司 | Internet + based power enterprise energy information management system |
CN116911806B (en) * | 2023-09-11 | 2023-11-28 | 湖北华中电力科技开发有限责任公司 | Internet + based power enterprise energy information management system |
CN117971625A (en) * | 2024-03-27 | 2024-05-03 | 莱芜职业技术学院 | Performance data intelligent monitoring system based on computer cloud platform |
CN117971625B (en) * | 2024-03-27 | 2024-06-07 | 莱芜职业技术学院 | Performance data intelligent monitoring system based on computer cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189232A (en) | Power information based on isolated forest algorithm acquires data exception analysis method | |
CN110503570A (en) | A kind of exception electricity consumption data detection method, system, equipment, storage medium | |
CN111639237B (en) | Electric power communication network risk assessment system based on clustering and association rule mining | |
Kosman et al. | Conservation prioritization based on trait‐based metrics illustrated with global parrot distributions | |
CN106095639A (en) | A kind of cluster subhealth state method for early warning and system | |
CN111382897A (en) | Transformer area low-voltage trip prediction method and device, computer equipment and storage medium | |
CN107133652A (en) | Electricity customers Valuation Method and system based on K means clustering algorithms | |
CN108846555A (en) | A kind of efficient accurate enthesis of electric load big data missing values | |
CN109242170A (en) | A kind of City Road Management System and method based on data mining technology | |
CN107832876A (en) | Subregion peak load Forecasting Methodology based on MapReduce frameworks | |
CN107766406A (en) | A kind of track similarity join querying method searched for using time priority | |
CN114610706A (en) | Electricity stealing detection method, system and device based on oversampling and improved random forest | |
CN114519514A (en) | Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment | |
CN112988717B (en) | Design and construction method of resident intelligent energy consumption service specimen library | |
CN114662909A (en) | Rural land operation right circulation transaction price index calculation system | |
CN107862459B (en) | Metering equipment state evaluation method and system based on big data | |
Wang et al. | Stull: Unbiased online sampling for visual exploration of large spatiotemporal data | |
CN109146316A (en) | Power marketing checking method, device and computer readable storage medium | |
Xu et al. | Evaluation of fault level of sensitive equipment caused by voltage sag via data mining | |
Zhao et al. | Hadoop-Based Power Grid Data Quality Verification and Monitoring Method | |
CN110020747A (en) | A kind of analysis of Influential Factors method of load release characteristics | |
CN112529475B (en) | Urban and rural collaborative development analysis method, device and storage medium | |
Li et al. | Distribution transformer mid-term heavy load and overload pre-warning based on logistic regression | |
CN114662563A (en) | Industrial electricity non-invasive load decomposition method based on gradient lifting algorithm | |
CN110175705B (en) | Load prediction method and memory and system comprising same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |