CN107862347A - A kind of discovery method of the electricity stealing based on random forest - Google Patents

A kind of discovery method of the electricity stealing based on random forest Download PDF

Info

Publication number
CN107862347A
CN107862347A CN201711260280.XA CN201711260280A CN107862347A CN 107862347 A CN107862347 A CN 107862347A CN 201711260280 A CN201711260280 A CN 201711260280A CN 107862347 A CN107862347 A CN 107862347A
Authority
CN
China
Prior art keywords
feature
data
user
electricity
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711260280.XA
Other languages
Chinese (zh)
Inventor
刘晓
施亚林
张同乔
张若冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Original Assignee
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd filed Critical Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority to CN201711260280.XA priority Critical patent/CN107862347A/en
Publication of CN107862347A publication Critical patent/CN107862347A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of discovery method of the electricity stealing based on random forest, comprise the following steps:Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, the possible data of stealing are not present in rejecting;Initial data after screening is pre-processed, feature, which carries out extraction, to be included extracting Variance feature and extracting containing zero percentage feature;Pretreated data are tested using random forests algorithm and final experimental result is calculated.The present invention greatly eliminates existing artificial electricity anti-theft method existing the drawbacks of consuming a large amount of manpower and materials, reduces the job costs of anti-electricity-theft work, improves the operating efficiency of anti-electricity-theft work.The anti-electricity-theft work of big data instrument assist process is used simultaneously, the degree of accuracy of anti-electricity-theft work is favorably improved, is power industry trend of the times.

Description

A kind of discovery method of the electricity stealing based on random forest
Technical field
The present invention relates to technical field of power systems, more particularly to a kind of discovery of the electricity stealing based on random forest Method.
Background technology
Electric energy early has evolved into the indispensable energy, electric energy as today's social life, the important driving force of development Shortage, resident's normal life can be caused not ensure, industrial production can not be normally carried out.However, always have some criminals, In order to seek oneself private interests, electric power resource is used come illegal by the way of electric power is stolen, electric energy expense is paid in escape.This row The interests of the country and people for serious infringement, suspected illegal crime.Therefore, anti-electricity-theft work is always the important of electric company One of work.
Traditional electricity anti-theft method includes the methods of coarsenesses such as regular visit, periodic verification ammeter.However, traditional is anti- There is certain drawback in stealing electricity method, and to expend substantial amounts of manpower and materials when hitting electricity stealing, therefore, using data mining Mode, gather client electricity consumption data, data carried out with characteristic quantity collection, analysis using intelligent algorithm, client is judged with this Whether end occurs electricity stealing, it is possible to prevente effectively from the problem of workload of traditional electricity anti-theft method is excessive, efficiency is low.
In summary, in the prior art for how the anti-electricity-theft problem of efficiently and accurately, still lack effective solution.
The content of the invention
In order to solve the deficiencies in the prior art, the invention provides a kind of electricity anti-theft method based on random forests algorithm, It is anti-to being lifted to introduce random forests algorithm by using random forests algorithm from system architecture, data processing etc. by the present invention The important function of stealing efforts efficiency.
A kind of discovery method of the electricity stealing based on random forest, comprises the following steps:
Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, pick Except in the absence of the possible data of stealing;
Initial data after screening is pre-processed, including:Stealing user data and normal user data are carried out pair Than, the two is compared with the difference of electrical feature, extract difference it is obvious, it is signature use electrical feature, structure afterwards Expert's sample set is built, and extraction operation is carried out to feature, the feature, which carries out extraction, includes extracting Variance feature and extraction containing zero Percentage feature;
Pretreated data are tested using random forests algorithm and that final experimental result is calculated is specific For:By random forests algorithm, decision tree classification is carried out to user data, final classification result is voted by the decision tree trained Determine, judge whether user has electricity stealing with this.
Further, the screening to data includes:The user data extracted from marketing system includes all kinds of electricity consumptions Type, the information in the absence of the possible large user of stealing is rejected with reference to electricity consumption type, meanwhile, for having checked and verify stealing or electricity consumption The information of the user of terminal alarms, it should also be removed.
Further, the extraction specific formula of Variance feature is:
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is that user averagely uses Electricity;K is the size of amount of user data;
Variance major embodiment has gone out the fluctuation situation of data, when a certain user power utilization data significantly fluctuation occur now As power consumption is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.
Further, the extraction, which contains the zero specific formula of percentage feature, is:
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total number of i-th of user According to amount;
Outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;If certain user In addition to a small number of dates, most of time power consumption is zero, then has and larger there may be electricity stealing;If certain user power utilization amount is broken It is zero to continue, then existing certainly possible has electricity stealing.
Further, the random forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K } group Into integrated classifier, wherein { θkIt is to obey independent identically distributed random vector, k represents of decision tree in random forest Number, under given independent variable X, each decision tree classifier determines optimal classification results by voting.
Further, decision tree classification uses CART Decision-Tree Methods, particular content in the decision tree classifier For:CART algorithms calculate Gini (t) desired values of each possible dividing mode in this feature, to each feature, look for The minimum one kind of Gini (t) desired values is as optimum division on to this feature, then the optimum division of more all candidate features Gini (t) desired values, a feature for finally possessing minimum Gini (t) desired value are selected as disruptive features on this node, And branch is created according to each characteristic value, said process is repeated, further sample is entered in each non-leaf nodes Row division, untill the stopping criterion for reaching certain.
Further, the generation specific algorithm step of the random forest is as follows:
Assuming that the forest scale to be built is k, concentrated in training sample, by Bagging algorithms generate k it is new Self-service sample set;
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life;
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual;
Feature mtry≤ n, the information content contained by calculating each feature, according to the minimum principle of node impurity level in mtry The feature of a most classification capacity is selected to carry out node split in individual feature;
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting;
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample are by tree Depending on the ballot of grader is how many.
Further, the Bagging algorithms are a kind of Ensemble Learning Algorithms, give some weak learning algorithms and training set Sample T={ (x1,y1),(x2,y2),......(xn,yn), the extraction sample put back to is carried out to it, afterwards each basis Training subset identical with original training set but different in a quantity is generated on grader, can be trained afterwards Different basic classification device.
Further, the Bagging algorithms particular content is as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, extract m number with putting back to According to (m≤n), brand-new self-service training dataset is formed;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted Determine.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention realizes the judgement to stealing row greatly to eliminate existing artificial anti-electricity-theft side using random forests algorithm The drawbacks of method existing consumption a large amount of manpower and materials, the job costs of anti-electricity-theft work are reduced, improve anti-electricity-theft work Operating efficiency.The anti-electricity-theft work of big data instrument assist process is used simultaneously, the degree of accuracy of anti-electricity-theft work is favorably improved, is Power industry trend of the times.
Brief description of the drawings
The Figure of description for forming the part of the application is used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its illustrate be used for explain the application, do not form the improper restriction to the application.
Fig. 1 is user's stealing identification process of the present invention;
Fig. 2 is electricity consumption data processing and the feature extraction flow of the present invention;
Fig. 3 is decision tree brief configuration schematic diagram in the random forests algorithm used of the invention.
Embodiment
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another Indicate, all technologies used herein and scientific terminology are with usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
Term explains part:Random forest, it is a grader for including multiple decision trees.By to a large amount of initial data The sampling put back to is taken, builds Sub Data Set, then sub-tree is built by Sub Data Set.Subtree is divided by feature to be selected Branch, data to be selected are subjected to coding specification via feature to be selected, finally the multiple sorting of operation according to mass data in the algorithm As a result the situation of each batch data is determined.
As background technology is introduced, exist in the prior art on how accurately to realize the problem of anti-electricity-theft, in order to Solves technical problem as above, present applicant proposes a kind of discovery method of the electricity stealing based on random forest.The application obtains Power taking Force system user data, rejecting contain zero percentage feature in the absence of the possible data of stealing, extraction Variance feature and extraction Final testing result is drawn to data prediction, using random forests algorithm to data progress measuring and calculation.Pass through intelligent algorithm Analysis calculating is carried out to power system customer data, anti-electricity-theft operating efficiency can be effectively improved, a large amount of reduction manpower and materials Consumption.
In a kind of typical embodiment of the application, as shown in Figure 1, there is provided a kind of stealing row based on random forest For discovery method, comprise the following steps:
The possible data of stealing are not present in step 1, acquisition power system customer data, rejecting, and its particular content is:Will be through It is managed collectively, is stored by the power system customer electricity consumption data of the technical limit spacings such as remote meter reading, automatic data logging, and from marketing The user data that being extracted in system needs to judge is screened, and is selectively rejected the preferably non-resident electricity consumption classification of prestige and is used Family, and reject all terminal alarms users and all stealing user data.
On the screening to data:The user data extracted from marketing system includes all kinds of electricity consumption types, as resident gives birth to Apply flexibly electric, big commercial power, general industry and commerce electricity consumption etc..In screening process, it should be recognized that electricity stealing is only a few user Behavior, it is appropriate to reject such as school, the information of bank large user to improve operating efficiency, reducing amount of calculation;Meanwhile for The information of user through checking and verify stealing or electric terminal alarm, it should also be removed.
Step 2, the initial data after simplifying is pre-processed, judge that examining the important of stealing user closely uses electrical feature, structure Expert's sample set is built, carries out feature extraction.
The initial analysis to data:Stealing user data is contrasted with normal user data, to the two electricity consumption The difference of feature is compared, and extracts that difference is obvious, signature uses electrical feature;Expert's sample set is built, and it is right Feature carries out extraction operation.
In random forests algorithm, the characteristic value mainly extracted has variance and containing zero percentage, with extract Variance feature and Extraction carries out algorithm computing containing zero percentage feature as principal character.Specific method is as follows:
(1) Variance feature is extracted
Variance major embodiment has gone out the fluctuation situation of data.When a certain user power utilization data significantly fluctuation occur now As power consumption is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.Extract Variance feature formula It is as follows:
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is that user averagely uses Electricity;K is the size of amount of user data.
(2) extraction contains zero percentage feature
User power utilization data are analyzed, we can be obtained to draw a conclusion:
(1) outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;
(2) if certain user is in addition to a small number of dates, most of time power consumption is zero, then has and larger there may be stealing row For;
(3) if certain user power utilization amount is discontinuously zero, existing certainly possible has electricity stealing.
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total number of i-th of user According to amount.
Step 3, by random forests algorithm measuring and calculation is carried out to sample data, draw final experiment prediction result.
It can be seen that on electricity consumption data feature extraction flow and tagsort as shown in Fig. 2 including obtaining user data, number Data preprocess, data characteristics extraction and random forests algorithm prediction obtain result.
The utilization of random forests algorithm:By random forests algorithm, decision tree classification, final classification are carried out to user data As a result chosen in a vote by the decision tree trained, judge whether user has electricity stealing with this.
Random forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K composition Ensemble classifier Device, wherein { θkIt is to obey independent identically distributed random vector, k represents the number of decision tree in random forest, given from change Measure under X, each decision tree classifier determines optimal classification results by voting.
Random forest is the grader that many decision trees integrate, if decision tree is regarded as one in classification task Individual expert, random forest are exactly that many experts classify to certain task together.
As shown in figure 3, the specific algorithm step of generation random forest is as follows:
Assuming that our the forest scales to be built are k.We are concentrated in training sample, and k is generated by Bagging methods Individual new self-service sample set.
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life.
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual
Feature (mtry≤ n), the information content contained by calculating each feature, exist according to the minimum principle of node impurity level mtryThe feature of a most classification capacity is selected to carry out node split in individual feature.
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting.
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample are by tree Depending on the ballot of grader is how many.
The Bagging algorithms are a kind of Ensemble Learning Algorithms.Give some weak learning algorithm and training set sample T= {(x1,y1),(x2,y2),......(xn,yn), the extraction sample put back to is carried out to it, afterwards in each fundamental classifier Training subset identical with original training set but different in a quantity is generated, different base can be trained afterwards This grader.Algorithm particular content is as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, extract m number with putting back to According to (m≤n), brand-new self-service training dataset is formed;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted Determine.
Bagging algorithms, it is a kind of most directly perceived and simplest method in Integrated Algorithm handled training set.It is right (such as decision tree, artificial neural network scheduling algorithm) is sayed in unstable learning algorithm, Bagging algorithms can effectively improve calculation The generalization ability of method.
Boosting algorithms and Bagging algorithms are quite similar, and two kinds of algorithms have used the grader of same type.They Most important difference be that Bagging algorithms are to be randomly selected from each meta classifier, therefore when choosing training set It is separate between each meta classifier trained, in the absence of obvious correlation;And Boosting algorithms are selecting When taking training set, each grader is obtained by serial training, the training set obtained by each round serial training all with before Learning outcome be related.In addition, each grader obtained by Bagging Algorithm for Training, its weight proportion, which is set, is Identical, and obtained by Boosting algorithms be different.
In the training process, the mode that Boosting algorithms generate different sub-classifiers is to concentrate sample to enter training data Row weights again.Its core concept is:Sample weights are redistributed, mistake can be given during grader is trained one by one Sample is divided more to pay close attention to.Specific algorithm is described as:First, it is equal to assign each sample of training data concentration for Boosting algorithms Weight, use it for training first sub-classifier and test training sample, obtain the prediction result of each sample.
For prediction result, divide sample more attention rates to give mistake, the weight of the sample of classification error is improved, simultaneously The weight of the sample of classification error is reduced.After weight adjustment, we are entered using this new training set to next sub-classifier Row training, and repeat the above steps, until error rate is less than some threshold value set in advance.
The division methods that the CART algorithms divide using two points of recurrence, follow strictly base during node-classification Buddhist nun's index minimizes this principle, and sample recursively is divided into two sample sets on node, and this, which is divided in, reaches certain Stop at one default stopping criterion.It follows that the Dou Youliangge branches on each nonleaf node on CART trees.
For CART trees in node division, the fragmentation criterion taken is Geordie (Gini) index:During division, choosing has minimum The attribute of gini index value is the Split Attribute of node.
Assuming that data set T { X, Y } includes the sample of k classification, Geordie index definition is as follows:
Wherein, p (j | t) is probability of the classification j in node t.
When having the sample to be all under the jurisdiction of same type at node t, Gini (t) desired values are zero, represent this node this When sample it is pure;For sample when classification field is uniformly distributed, Gini (t) desired values reach maximum, represent this at node t Now sample is most impure for one node.Sample set is divided into m part, then is for division Gini (t) indexes:
Wherein, m is the number of child node, niIt is the sample number at sub 7 node is, n is the sample number at parent node.
CART algorithms will be calculated dividing corresponding Gini (t) desired value each time, the value is got in node split It is small, illustrate that a kind of this division methods is more reasonable.Each feature concentrated for candidate feature,
CART algorithms calculate Gini (t) desired values of each possible dividing mode in this feature, special to each Sign, find one kind that Gini (t) desired values are minimum in this feature and drawn as optimum division, then the optimal of more all candidate features Point Gini (t) desired values, finally possessing a feature of minimum Gini (t) desired value, to be selected as division on this node special Sign, and branch is created according to each characteristic value.Said process is repeated, further to sample in each non-leaf nodes Divided, untill the stopping criterion for reaching certain.
In addition, the example of the electricity anti-theft method based on random forests algorithm of above-mentioned example is merely illustrative of, it is actual to answer Can be as needed in, such as consider for the convenient of realization of the configuration requirement or software of corresponding hardware, by above-mentioned work( Can distribution completed by different functional module, will the CSAT evaluation system internal structure be divided into it is different Functional module, to complete all or part of function described above.Wherein each function mould can both use the form of hardware real It is existing, it can also be realized in the form of software function module.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can To instruct the hardware of correlation to complete by computer program, described program can be stored in computer read/write memory medium In, as independent production marketing or use.Described program upon execution, can perform the whole of the embodiment such as above-mentioned each method Or part steps.Wherein, described storage medium can be magnetic disc, CD, read-only memory, or random access memory Deng.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.

Claims (9)

1. a kind of discovery method of the electricity stealing based on random forest, it is characterized in that, comprise the following steps:
Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, reject not The possible data of stealing be present;
Initial data after screening is pre-processed, including:Stealing user data is contrasted with normal user data, it is right The two is compared with the difference of electrical feature, extract difference it is obvious, it is signature use electrical feature, build expert afterwards Sample set, and extraction operation is carried out to feature, the feature, which carries out extraction, to be included extracting Variance feature and extracting containing zero percentage Feature;
Pretreated data are tested using random forests algorithm and final experimental result is calculated and are specially:It is logical Random forests algorithm to be crossed, decision tree classification is carried out to user data, final classification result is chosen in a vote by the decision tree trained, Judge whether user has electricity stealing with this.
2. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the logarithm According to screening include:The user data extracted from marketing system includes all kinds of electricity consumption types, rejects and does not deposit with reference to electricity consumption type In the information of the possible large user of stealing, meanwhile, the information of the user for having checked and verify stealing or electric terminal alarm, also should When being removed.
3. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the extraction The specific formula of Variance feature is:
<mrow> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&amp;Sigma;</mo> <mi>k</mi> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <msub> <mi>i</mi> <mi>k</mi> </msub> </msub> <mo>-</mo> <msub> <mover> <mi>X</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mi>k</mi> </mfrac> </mrow>
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is the average power consumption of user;k It is the size of amount of user data;
Variance major embodiment has gone out the fluctuation situation of data, when significantly wave phenomenon, use occur in a certain user power utilization data Electricity is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.
4. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the extraction It is containing the zero specific formula of percentage feature:
<mrow> <msub> <mi>P</mi> <mrow> <msub> <mi>Zero</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>X</mi> <mi>j</mi> </msub> <msub> <mi>X</mi> <mi>i</mi> </msub> </mfrac> <mo>&amp;times;</mo> <mn>100</mn> <mi>%</mi> </mrow>
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total data of i-th of user Amount;
Outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;If certain user is except few Outside phase a few days, most of time power consumption is zero, then has and larger there may be electricity stealing;If certain user power utilization amount is discontinuously Zero, then existing certainly possible has electricity stealing.
5. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, it is described random Forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K } composition integrated classifier, wherein { θkIt is clothes From independent identically distributed random vector, k represents the number of decision tree in random forest, under given independent variable X, each decision tree Grader determines optimal classification results by voting.
6. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 5, it is characterized in that, the decision-making Decision tree classification uses CART Decision-Tree Methods in Tree Classifier, and particular content is:CART algorithms are calculated in this feature Gini (t) desired values of each possible dividing mode, to each feature, find in this feature Gini (t) desired values most Gini (t) desired value of the small one kind as optimum division, then the optimum division of more all candidate features, finally possesses minimum One feature of Gini (t) desired values is selected as disruptive features on this node, and is created and divided according to each characteristic value Branch, said process is repeated, further sample is divided in each non-leaf nodes, until the stopping for reaching certain is accurate Untill then.
7. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 5, it is characterized in that, it is described random The generation specific algorithm step of forest is as follows:
Assuming that the forest scale to be built is k, concentrated in training sample, by Bagging algorithms generate k it is new self-service Sample set;
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life;
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual;
Feature mtry≤ n, the information content contained by calculating each feature, according to the minimum principle of node impurity level in mtryIndividual spy The feature of a most classification capacity is selected to carry out node split in sign;
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting;
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample press tree classification Depending on the ballot of device is how many.
8. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 7, it is characterized in that, it is described Bagging algorithms are a kind of Ensemble Learning Algorithms, give some weak learning algorithm and training set sample T={ (x1,y1),(x2, y2),......(xn,yn), the extraction sample put back to is carried out to it, generates a number in each fundamental classifier afterwards Training subset identical with original training set in amount but different, different basic classification device can be trained afterwards.
9. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 8, it is characterized in that, it is described Bagging algorithm particular contents are as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, with putting back to extract m data m≤ N, form brand-new self-service training dataset;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted certainly It is fixed.
CN201711260280.XA 2017-12-04 2017-12-04 A kind of discovery method of the electricity stealing based on random forest Pending CN107862347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711260280.XA CN107862347A (en) 2017-12-04 2017-12-04 A kind of discovery method of the electricity stealing based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711260280.XA CN107862347A (en) 2017-12-04 2017-12-04 A kind of discovery method of the electricity stealing based on random forest

Publications (1)

Publication Number Publication Date
CN107862347A true CN107862347A (en) 2018-03-30

Family

ID=61704996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711260280.XA Pending CN107862347A (en) 2017-12-04 2017-12-04 A kind of discovery method of the electricity stealing based on random forest

Country Status (1)

Country Link
CN (1) CN107862347A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410074A (en) * 2018-10-18 2019-03-01 广州市勤思网络科技有限公司 Intelligent core protects method and system
CN109657705A (en) * 2018-12-03 2019-04-19 国网天津市电力公司电力科学研究院 A kind of automobile user clustering method and device based on random forests algorithm
CN109739846A (en) * 2018-12-27 2019-05-10 国电南瑞科技股份有限公司 A kind of electric network data mass analysis method
CN109858886A (en) * 2019-02-18 2019-06-07 国网吉林省电力有限公司电力科学研究院 It is a kind of that control success rate promotion analysis method is taken based on integrated study
CN111210269A (en) * 2020-01-02 2020-05-29 平安科技(深圳)有限公司 Object identification method based on big data, electronic device and storage medium
CN111428804A (en) * 2020-04-01 2020-07-17 广东电网有限责任公司 Random forest electricity stealing user detection method with optimized weighting
CN111428808A (en) * 2020-04-08 2020-07-17 成都爱科特科技发展有限公司 Method for classifying services by using random forest
CN111753907A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Method, device, equipment and storage medium for processing electric quantity data
CN111861786A (en) * 2020-06-12 2020-10-30 国网浙江省电力有限公司电力科学研究院 Special transformer electricity stealing identification method based on feature selection and isolated random forest
CN112101635A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Method and system for monitoring electricity utilization abnormity
CN112329895A (en) * 2021-01-05 2021-02-05 国网江西综合能源服务有限公司 Method and device for identifying user with suspicion of electricity stealing
CN112801145A (en) * 2021-01-12 2021-05-14 深圳市中博科创信息技术有限公司 Safety monitoring method and device, computer equipment and storage medium
CN113128567A (en) * 2021-03-25 2021-07-16 云南电网有限责任公司 Abnormal electricity consumption behavior identification method based on electricity consumption data
CN113282613A (en) * 2021-04-16 2021-08-20 广东电网有限责任公司计量中心 Method, system, equipment and storage medium for analyzing power consumption of specific transformer and low-voltage user
CN113362118A (en) * 2021-07-08 2021-09-07 广东电网有限责任公司 User electricity consumption behavior analysis method and system based on random forest
CN113946720A (en) * 2020-07-17 2022-01-18 中国移动通信集团广东有限公司 Method and device for identifying users in group and electronic equipment
CN115032720A (en) * 2022-07-15 2022-09-09 国网上海市电力公司 Application of multi-mode integrated forecast based on random forest in ground air temperature forecast
CN116595463A (en) * 2023-07-18 2023-08-15 国网山东省电力公司武城县供电公司 Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488867A (en) * 2013-07-16 2014-01-01 深圳市航天泰瑞捷电子有限公司 Method for automatically screening abnormal electricity consumption user
CN105205531A (en) * 2014-06-30 2015-12-30 国家电网公司 Anti-electric-larceny prediction method based on machine learning and apparatus thereof
CN105573997A (en) * 2014-10-09 2016-05-11 普华讯光(北京)科技有限公司 Method and device for determining electric larceny suspect user
CN106019087A (en) * 2016-07-20 2016-10-12 国网上海市电力公司 Intermittent electricity stealing monitoring system
CN106909933A (en) * 2017-01-18 2017-06-30 南京邮电大学 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488867A (en) * 2013-07-16 2014-01-01 深圳市航天泰瑞捷电子有限公司 Method for automatically screening abnormal electricity consumption user
CN105205531A (en) * 2014-06-30 2015-12-30 国家电网公司 Anti-electric-larceny prediction method based on machine learning and apparatus thereof
CN105573997A (en) * 2014-10-09 2016-05-11 普华讯光(北京)科技有限公司 Method and device for determining electric larceny suspect user
CN106019087A (en) * 2016-07-20 2016-10-12 国网上海市电力公司 Intermittent electricity stealing monitoring system
CN106909933A (en) * 2017-01-18 2017-06-30 南京邮电大学 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
李贞贵: "随机森林改进的若干研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
汪嵘明等: "《商务经济统计》", 31 July 2016 *
许智等: "基于机器学习的用户窃电行为预测", 《上海电力学院学报》 *
陈晶晶等: "基于随机森林的用电行为分析", 《上海电力学院学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410074A (en) * 2018-10-18 2019-03-01 广州市勤思网络科技有限公司 Intelligent core protects method and system
CN109657705A (en) * 2018-12-03 2019-04-19 国网天津市电力公司电力科学研究院 A kind of automobile user clustering method and device based on random forests algorithm
CN109739846A (en) * 2018-12-27 2019-05-10 国电南瑞科技股份有限公司 A kind of electric network data mass analysis method
CN109858886A (en) * 2019-02-18 2019-06-07 国网吉林省电力有限公司电力科学研究院 It is a kind of that control success rate promotion analysis method is taken based on integrated study
CN109858886B (en) * 2019-02-18 2021-03-19 国网吉林省电力有限公司电力科学研究院 Integrated learning-based cost control success rate promotion analysis method
CN111210269A (en) * 2020-01-02 2020-05-29 平安科技(深圳)有限公司 Object identification method based on big data, electronic device and storage medium
CN111428804A (en) * 2020-04-01 2020-07-17 广东电网有限责任公司 Random forest electricity stealing user detection method with optimized weighting
CN111428808A (en) * 2020-04-08 2020-07-17 成都爱科特科技发展有限公司 Method for classifying services by using random forest
CN111861786A (en) * 2020-06-12 2020-10-30 国网浙江省电力有限公司电力科学研究院 Special transformer electricity stealing identification method based on feature selection and isolated random forest
CN111861786B (en) * 2020-06-12 2024-07-12 国网浙江省电力有限公司营销服务中心 Special power-stealing identification method based on feature selection and isolated random forest
CN111753907A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Method, device, equipment and storage medium for processing electric quantity data
CN113946720A (en) * 2020-07-17 2022-01-18 中国移动通信集团广东有限公司 Method and device for identifying users in group and electronic equipment
CN112101635A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Method and system for monitoring electricity utilization abnormity
CN112329895A (en) * 2021-01-05 2021-02-05 国网江西综合能源服务有限公司 Method and device for identifying user with suspicion of electricity stealing
CN112801145B (en) * 2021-01-12 2024-05-28 深圳市中博科创信息技术有限公司 Security monitoring method, device, computer equipment and storage medium
CN112801145A (en) * 2021-01-12 2021-05-14 深圳市中博科创信息技术有限公司 Safety monitoring method and device, computer equipment and storage medium
CN113128567A (en) * 2021-03-25 2021-07-16 云南电网有限责任公司 Abnormal electricity consumption behavior identification method based on electricity consumption data
CN113282613A (en) * 2021-04-16 2021-08-20 广东电网有限责任公司计量中心 Method, system, equipment and storage medium for analyzing power consumption of specific transformer and low-voltage user
CN113282613B (en) * 2021-04-16 2023-05-26 广东电网有限责任公司计量中心 Method, system, equipment and storage medium for analyzing power consumption of private transformer and low-voltage user
CN113362118A (en) * 2021-07-08 2021-09-07 广东电网有限责任公司 User electricity consumption behavior analysis method and system based on random forest
CN115032720A (en) * 2022-07-15 2022-09-09 国网上海市电力公司 Application of multi-mode integrated forecast based on random forest in ground air temperature forecast
CN116595463A (en) * 2023-07-18 2023-08-15 国网山东省电力公司武城县供电公司 Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device
CN116595463B (en) * 2023-07-18 2023-09-19 国网山东省电力公司武城县供电公司 Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device

Similar Documents

Publication Publication Date Title
CN107862347A (en) A kind of discovery method of the electricity stealing based on random forest
Li et al. Adaptive multi-objective swarm fusion for imbalanced data classification
CN107766883A (en) A kind of optimization random forest classification method and system based on weighted decision tree
Xia et al. Detection methods in smart meters for electricity thefts: A survey
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN107194803A (en) P2P net loan borrower credit risk assessment device
Oprea et al. Machine learning classification algorithms and anomaly detection in conventional meters and Tunisian electricity consumption large datasets
CN102324038B (en) Plant species identification method based on digital image
CN107507038A (en) A kind of electricity charge sensitive users analysis method based on stacking and bagging algorithms
CN109739844B (en) Data classification method based on attenuation weight
CN108960833A (en) A kind of abnormal transaction identification method based on isomery finance feature, equipment and storage medium
CN101147160B (en) Adaptive classifier, and method of creation of classification parameters therefor
CN103632168A (en) Classifier integration method for machine learning
CN109886284B (en) Fraud detection method and system based on hierarchical clustering
CN110998608A (en) Machine learning system for various computer applications
CN105740912A (en) Nuclear norm regularization based low-rank image characteristic extraction identification method and system
CN112001788B (en) Credit card illegal fraud identification method based on RF-DBSCAN algorithm
CN106845846A (en) Big data asset evaluation method
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
Sudha et al. Credit card fraud detection system based on operational & transaction features using svm and random forest classifiers
Zhang et al. Research on borrower's credit classification of P2P network loan based on LightGBM algorithm
CN110363384A (en) Exception electric detection method based on depth weighted neural network
Zhou et al. Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm
CN117408699A (en) Telecom fraud recognition method based on bank card data
Mishra et al. Improving the efficacy of clustering by using far enhanced clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330

RJ01 Rejection of invention patent application after publication