CN107862347A - A kind of discovery method of the electricity stealing based on random forest - Google Patents
A kind of discovery method of the electricity stealing based on random forest Download PDFInfo
- Publication number
- CN107862347A CN107862347A CN201711260280.XA CN201711260280A CN107862347A CN 107862347 A CN107862347 A CN 107862347A CN 201711260280 A CN201711260280 A CN 201711260280A CN 107862347 A CN107862347 A CN 107862347A
- Authority
- CN
- China
- Prior art keywords
- feature
- data
- user
- electricity
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of discovery method of the electricity stealing based on random forest, comprise the following steps:Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, the possible data of stealing are not present in rejecting;Initial data after screening is pre-processed, feature, which carries out extraction, to be included extracting Variance feature and extracting containing zero percentage feature;Pretreated data are tested using random forests algorithm and final experimental result is calculated.The present invention greatly eliminates existing artificial electricity anti-theft method existing the drawbacks of consuming a large amount of manpower and materials, reduces the job costs of anti-electricity-theft work, improves the operating efficiency of anti-electricity-theft work.The anti-electricity-theft work of big data instrument assist process is used simultaneously, the degree of accuracy of anti-electricity-theft work is favorably improved, is power industry trend of the times.
Description
Technical field
The present invention relates to technical field of power systems, more particularly to a kind of discovery of the electricity stealing based on random forest
Method.
Background technology
Electric energy early has evolved into the indispensable energy, electric energy as today's social life, the important driving force of development
Shortage, resident's normal life can be caused not ensure, industrial production can not be normally carried out.However, always have some criminals,
In order to seek oneself private interests, electric power resource is used come illegal by the way of electric power is stolen, electric energy expense is paid in escape.This row
The interests of the country and people for serious infringement, suspected illegal crime.Therefore, anti-electricity-theft work is always the important of electric company
One of work.
Traditional electricity anti-theft method includes the methods of coarsenesses such as regular visit, periodic verification ammeter.However, traditional is anti-
There is certain drawback in stealing electricity method, and to expend substantial amounts of manpower and materials when hitting electricity stealing, therefore, using data mining
Mode, gather client electricity consumption data, data carried out with characteristic quantity collection, analysis using intelligent algorithm, client is judged with this
Whether end occurs electricity stealing, it is possible to prevente effectively from the problem of workload of traditional electricity anti-theft method is excessive, efficiency is low.
In summary, in the prior art for how the anti-electricity-theft problem of efficiently and accurately, still lack effective solution.
The content of the invention
In order to solve the deficiencies in the prior art, the invention provides a kind of electricity anti-theft method based on random forests algorithm,
It is anti-to being lifted to introduce random forests algorithm by using random forests algorithm from system architecture, data processing etc. by the present invention
The important function of stealing efforts efficiency.
A kind of discovery method of the electricity stealing based on random forest, comprises the following steps:
Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, pick
Except in the absence of the possible data of stealing;
Initial data after screening is pre-processed, including:Stealing user data and normal user data are carried out pair
Than, the two is compared with the difference of electrical feature, extract difference it is obvious, it is signature use electrical feature, structure afterwards
Expert's sample set is built, and extraction operation is carried out to feature, the feature, which carries out extraction, includes extracting Variance feature and extraction containing zero
Percentage feature;
Pretreated data are tested using random forests algorithm and that final experimental result is calculated is specific
For:By random forests algorithm, decision tree classification is carried out to user data, final classification result is voted by the decision tree trained
Determine, judge whether user has electricity stealing with this.
Further, the screening to data includes:The user data extracted from marketing system includes all kinds of electricity consumptions
Type, the information in the absence of the possible large user of stealing is rejected with reference to electricity consumption type, meanwhile, for having checked and verify stealing or electricity consumption
The information of the user of terminal alarms, it should also be removed.
Further, the extraction specific formula of Variance feature is:
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is that user averagely uses
Electricity;K is the size of amount of user data;
Variance major embodiment has gone out the fluctuation situation of data, when a certain user power utilization data significantly fluctuation occur now
As power consumption is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.
Further, the extraction, which contains the zero specific formula of percentage feature, is:
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total number of i-th of user
According to amount;
Outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;If certain user
In addition to a small number of dates, most of time power consumption is zero, then has and larger there may be electricity stealing;If certain user power utilization amount is broken
It is zero to continue, then existing certainly possible has electricity stealing.
Further, the random forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K } group
Into integrated classifier, wherein { θkIt is to obey independent identically distributed random vector, k represents of decision tree in random forest
Number, under given independent variable X, each decision tree classifier determines optimal classification results by voting.
Further, decision tree classification uses CART Decision-Tree Methods, particular content in the decision tree classifier
For:CART algorithms calculate Gini (t) desired values of each possible dividing mode in this feature, to each feature, look for
The minimum one kind of Gini (t) desired values is as optimum division on to this feature, then the optimum division of more all candidate features
Gini (t) desired values, a feature for finally possessing minimum Gini (t) desired value are selected as disruptive features on this node,
And branch is created according to each characteristic value, said process is repeated, further sample is entered in each non-leaf nodes
Row division, untill the stopping criterion for reaching certain.
Further, the generation specific algorithm step of the random forest is as follows:
Assuming that the forest scale to be built is k, concentrated in training sample, by Bagging algorithms generate k it is new
Self-service sample set;
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life;
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual;
Feature mtry≤ n, the information content contained by calculating each feature, according to the minimum principle of node impurity level in mtry
The feature of a most classification capacity is selected to carry out node split in individual feature;
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting;
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample are by tree
Depending on the ballot of grader is how many.
Further, the Bagging algorithms are a kind of Ensemble Learning Algorithms, give some weak learning algorithms and training set
Sample T={ (x1,y1),(x2,y2),......(xn,yn), the extraction sample put back to is carried out to it, afterwards each basis
Training subset identical with original training set but different in a quantity is generated on grader, can be trained afterwards
Different basic classification device.
Further, the Bagging algorithms particular content is as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, extract m number with putting back to
According to (m≤n), brand-new self-service training dataset is formed;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted
Determine.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention realizes the judgement to stealing row greatly to eliminate existing artificial anti-electricity-theft side using random forests algorithm
The drawbacks of method existing consumption a large amount of manpower and materials, the job costs of anti-electricity-theft work are reduced, improve anti-electricity-theft work
Operating efficiency.The anti-electricity-theft work of big data instrument assist process is used simultaneously, the degree of accuracy of anti-electricity-theft work is favorably improved, is
Power industry trend of the times.
Brief description of the drawings
The Figure of description for forming the part of the application is used for providing further understanding of the present application, and the application's shows
Meaning property embodiment and its illustrate be used for explain the application, do not form the improper restriction to the application.
Fig. 1 is user's stealing identification process of the present invention;
Fig. 2 is electricity consumption data processing and the feature extraction flow of the present invention;
Fig. 3 is decision tree brief configuration schematic diagram in the random forests algorithm used of the invention.
Embodiment
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another
Indicate, all technologies used herein and scientific terminology are with usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag
Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
Term explains part:Random forest, it is a grader for including multiple decision trees.By to a large amount of initial data
The sampling put back to is taken, builds Sub Data Set, then sub-tree is built by Sub Data Set.Subtree is divided by feature to be selected
Branch, data to be selected are subjected to coding specification via feature to be selected, finally the multiple sorting of operation according to mass data in the algorithm
As a result the situation of each batch data is determined.
As background technology is introduced, exist in the prior art on how accurately to realize the problem of anti-electricity-theft, in order to
Solves technical problem as above, present applicant proposes a kind of discovery method of the electricity stealing based on random forest.The application obtains
Power taking Force system user data, rejecting contain zero percentage feature in the absence of the possible data of stealing, extraction Variance feature and extraction
Final testing result is drawn to data prediction, using random forests algorithm to data progress measuring and calculation.Pass through intelligent algorithm
Analysis calculating is carried out to power system customer data, anti-electricity-theft operating efficiency can be effectively improved, a large amount of reduction manpower and materials
Consumption.
In a kind of typical embodiment of the application, as shown in Figure 1, there is provided a kind of stealing row based on random forest
For discovery method, comprise the following steps:
The possible data of stealing are not present in step 1, acquisition power system customer data, rejecting, and its particular content is:Will be through
It is managed collectively, is stored by the power system customer electricity consumption data of the technical limit spacings such as remote meter reading, automatic data logging, and from marketing
The user data that being extracted in system needs to judge is screened, and is selectively rejected the preferably non-resident electricity consumption classification of prestige and is used
Family, and reject all terminal alarms users and all stealing user data.
On the screening to data:The user data extracted from marketing system includes all kinds of electricity consumption types, as resident gives birth to
Apply flexibly electric, big commercial power, general industry and commerce electricity consumption etc..In screening process, it should be recognized that electricity stealing is only a few user
Behavior, it is appropriate to reject such as school, the information of bank large user to improve operating efficiency, reducing amount of calculation;Meanwhile for
The information of user through checking and verify stealing or electric terminal alarm, it should also be removed.
Step 2, the initial data after simplifying is pre-processed, judge that examining the important of stealing user closely uses electrical feature, structure
Expert's sample set is built, carries out feature extraction.
The initial analysis to data:Stealing user data is contrasted with normal user data, to the two electricity consumption
The difference of feature is compared, and extracts that difference is obvious, signature uses electrical feature;Expert's sample set is built, and it is right
Feature carries out extraction operation.
In random forests algorithm, the characteristic value mainly extracted has variance and containing zero percentage, with extract Variance feature and
Extraction carries out algorithm computing containing zero percentage feature as principal character.Specific method is as follows:
(1) Variance feature is extracted
Variance major embodiment has gone out the fluctuation situation of data.When a certain user power utilization data significantly fluctuation occur now
As power consumption is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.Extract Variance feature formula
It is as follows:
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is that user averagely uses
Electricity;K is the size of amount of user data.
(2) extraction contains zero percentage feature
User power utilization data are analyzed, we can be obtained to draw a conclusion:
(1) outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;
(2) if certain user is in addition to a small number of dates, most of time power consumption is zero, then has and larger there may be stealing row
For;
(3) if certain user power utilization amount is discontinuously zero, existing certainly possible has electricity stealing.
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total number of i-th of user
According to amount.
Step 3, by random forests algorithm measuring and calculation is carried out to sample data, draw final experiment prediction result.
It can be seen that on electricity consumption data feature extraction flow and tagsort as shown in Fig. 2 including obtaining user data, number
Data preprocess, data characteristics extraction and random forests algorithm prediction obtain result.
The utilization of random forests algorithm:By random forests algorithm, decision tree classification, final classification are carried out to user data
As a result chosen in a vote by the decision tree trained, judge whether user has electricity stealing with this.
Random forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K composition Ensemble classifier
Device, wherein { θkIt is to obey independent identically distributed random vector, k represents the number of decision tree in random forest, given from change
Measure under X, each decision tree classifier determines optimal classification results by voting.
Random forest is the grader that many decision trees integrate, if decision tree is regarded as one in classification task
Individual expert, random forest are exactly that many experts classify to certain task together.
As shown in figure 3, the specific algorithm step of generation random forest is as follows:
Assuming that our the forest scales to be built are k.We are concentrated in training sample, and k is generated by Bagging methods
Individual new self-service sample set.
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life.
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual
Feature (mtry≤ n), the information content contained by calculating each feature, exist according to the minimum principle of node impurity level
mtryThe feature of a most classification capacity is selected to carry out node split in individual feature.
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting.
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample are by tree
Depending on the ballot of grader is how many.
The Bagging algorithms are a kind of Ensemble Learning Algorithms.Give some weak learning algorithm and training set sample T=
{(x1,y1),(x2,y2),......(xn,yn), the extraction sample put back to is carried out to it, afterwards in each fundamental classifier
Training subset identical with original training set but different in a quantity is generated, different base can be trained afterwards
This grader.Algorithm particular content is as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, extract m number with putting back to
According to (m≤n), brand-new self-service training dataset is formed;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted
Determine.
Bagging algorithms, it is a kind of most directly perceived and simplest method in Integrated Algorithm handled training set.It is right
(such as decision tree, artificial neural network scheduling algorithm) is sayed in unstable learning algorithm, Bagging algorithms can effectively improve calculation
The generalization ability of method.
Boosting algorithms and Bagging algorithms are quite similar, and two kinds of algorithms have used the grader of same type.They
Most important difference be that Bagging algorithms are to be randomly selected from each meta classifier, therefore when choosing training set
It is separate between each meta classifier trained, in the absence of obvious correlation;And Boosting algorithms are selecting
When taking training set, each grader is obtained by serial training, the training set obtained by each round serial training all with before
Learning outcome be related.In addition, each grader obtained by Bagging Algorithm for Training, its weight proportion, which is set, is
Identical, and obtained by Boosting algorithms be different.
In the training process, the mode that Boosting algorithms generate different sub-classifiers is to concentrate sample to enter training data
Row weights again.Its core concept is:Sample weights are redistributed, mistake can be given during grader is trained one by one
Sample is divided more to pay close attention to.Specific algorithm is described as:First, it is equal to assign each sample of training data concentration for Boosting algorithms
Weight, use it for training first sub-classifier and test training sample, obtain the prediction result of each sample.
For prediction result, divide sample more attention rates to give mistake, the weight of the sample of classification error is improved, simultaneously
The weight of the sample of classification error is reduced.After weight adjustment, we are entered using this new training set to next sub-classifier
Row training, and repeat the above steps, until error rate is less than some threshold value set in advance.
The division methods that the CART algorithms divide using two points of recurrence, follow strictly base during node-classification
Buddhist nun's index minimizes this principle, and sample recursively is divided into two sample sets on node, and this, which is divided in, reaches certain
Stop at one default stopping criterion.It follows that the Dou Youliangge branches on each nonleaf node on CART trees.
For CART trees in node division, the fragmentation criterion taken is Geordie (Gini) index:During division, choosing has minimum
The attribute of gini index value is the Split Attribute of node.
Assuming that data set T { X, Y } includes the sample of k classification, Geordie index definition is as follows:
Wherein, p (j | t) is probability of the classification j in node t.
When having the sample to be all under the jurisdiction of same type at node t, Gini (t) desired values are zero, represent this node this
When sample it is pure;For sample when classification field is uniformly distributed, Gini (t) desired values reach maximum, represent this at node t
Now sample is most impure for one node.Sample set is divided into m part, then is for division Gini (t) indexes:
Wherein, m is the number of child node, niIt is the sample number at sub 7 node is, n is the sample number at parent node.
CART algorithms will be calculated dividing corresponding Gini (t) desired value each time, the value is got in node split
It is small, illustrate that a kind of this division methods is more reasonable.Each feature concentrated for candidate feature,
CART algorithms calculate Gini (t) desired values of each possible dividing mode in this feature, special to each
Sign, find one kind that Gini (t) desired values are minimum in this feature and drawn as optimum division, then the optimal of more all candidate features
Point Gini (t) desired values, finally possessing a feature of minimum Gini (t) desired value, to be selected as division on this node special
Sign, and branch is created according to each characteristic value.Said process is repeated, further to sample in each non-leaf nodes
Divided, untill the stopping criterion for reaching certain.
In addition, the example of the electricity anti-theft method based on random forests algorithm of above-mentioned example is merely illustrative of, it is actual to answer
Can be as needed in, such as consider for the convenient of realization of the configuration requirement or software of corresponding hardware, by above-mentioned work(
Can distribution completed by different functional module, will the CSAT evaluation system internal structure be divided into it is different
Functional module, to complete all or part of function described above.Wherein each function mould can both use the form of hardware real
It is existing, it can also be realized in the form of software function module.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can
To instruct the hardware of correlation to complete by computer program, described program can be stored in computer read/write memory medium
In, as independent production marketing or use.Described program upon execution, can perform the whole of the embodiment such as above-mentioned each method
Or part steps.Wherein, described storage medium can be magnetic disc, CD, read-only memory, or random access memory
Deng.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area
For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair
Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.
Claims (9)
1. a kind of discovery method of the electricity stealing based on random forest, it is characterized in that, comprise the following steps:
Obtain power system customer data and the user data for needing to judge is extracted from marketing system and is screened, reject not
The possible data of stealing be present;
Initial data after screening is pre-processed, including:Stealing user data is contrasted with normal user data, it is right
The two is compared with the difference of electrical feature, extract difference it is obvious, it is signature use electrical feature, build expert afterwards
Sample set, and extraction operation is carried out to feature, the feature, which carries out extraction, to be included extracting Variance feature and extracting containing zero percentage
Feature;
Pretreated data are tested using random forests algorithm and final experimental result is calculated and are specially:It is logical
Random forests algorithm to be crossed, decision tree classification is carried out to user data, final classification result is chosen in a vote by the decision tree trained,
Judge whether user has electricity stealing with this.
2. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the logarithm
According to screening include:The user data extracted from marketing system includes all kinds of electricity consumption types, rejects and does not deposit with reference to electricity consumption type
In the information of the possible large user of stealing, meanwhile, the information of the user for having checked and verify stealing or electric terminal alarm, also should
When being removed.
3. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the extraction
The specific formula of Variance feature is:
<mrow>
<msub>
<mi>V</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<munder>
<mo>&Sigma;</mo>
<mi>k</mi>
</munder>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>X</mi>
<msub>
<mi>i</mi>
<mi>k</mi>
</msub>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>X</mi>
<mo>&OverBar;</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
<mi>k</mi>
</mfrac>
</mrow>
Wherein:ViIt is the variance of user power utilization amount;It is the power consumption of i-th of user's kth day;It is the average power consumption of user;k
It is the size of amount of user data;
Variance major embodiment has gone out the fluctuation situation of data, when significantly wave phenomenon, use occur in a certain user power utilization data
Electricity is fluctuated for a long time, variance is larger, then the user has larger stealing possibility.
4. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, the extraction
It is containing the zero specific formula of percentage feature:
<mrow>
<msub>
<mi>P</mi>
<mrow>
<msub>
<mi>Zero</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<msub>
<mi>X</mi>
<mi>j</mi>
</msub>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
</mfrac>
<mo>&times;</mo>
<mn>100</mn>
<mi>%</mi>
</mrow>
Wherein:It is to contain zero percentage;XjIt is that i-th of user has comprising j zero data;XiIt is the total data of i-th of user
Amount;
Outside depolarization special circumstances, certain user power utilization amount is all zero daily, then user's stealing possibility is high;If certain user is except few
Outside phase a few days, most of time power consumption is zero, then has and larger there may be electricity stealing;If certain user power utilization amount is discontinuously
Zero, then existing certainly possible has electricity stealing.
5. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 1, it is characterized in that, it is described random
Forest is one by one group of decision tree classifier { h (X, θk), k=1,2 ..., K } composition integrated classifier, wherein { θkIt is clothes
From independent identically distributed random vector, k represents the number of decision tree in random forest, under given independent variable X, each decision tree
Grader determines optimal classification results by voting.
6. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 5, it is characterized in that, the decision-making
Decision tree classification uses CART Decision-Tree Methods in Tree Classifier, and particular content is:CART algorithms are calculated in this feature
Gini (t) desired values of each possible dividing mode, to each feature, find in this feature Gini (t) desired values most
Gini (t) desired value of the small one kind as optimum division, then the optimum division of more all candidate features, finally possesses minimum
One feature of Gini (t) desired values is selected as disruptive features on this node, and is created and divided according to each characteristic value
Branch, said process is repeated, further sample is divided in each non-leaf nodes, until the stopping for reaching certain is accurate
Untill then.
7. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 5, it is characterized in that, it is described random
The generation specific algorithm step of forest is as follows:
Assuming that the forest scale to be built is k, concentrated in training sample, by Bagging algorithms generate k it is new self-service
Sample set;
Each self-service sample set is used to build a classification tree, then k new classification trees of common property life;
Provided with n feature, then m is randomly selected at each node of every one treetryIt is individual;
Feature mtry≤ n, the information content contained by calculating each feature, according to the minimum principle of node impurity level in mtryIndividual spy
The feature of a most classification capacity is selected to carry out node split in sign;
Each tree grows to greatest extent, until the impurity level of each leaf node reaches minimum, does not do any cutting;
New position sample is predicted according to the multiple CART Tree Classifiers built, the classification results of unknown sample press tree classification
Depending on the ballot of device is how many.
8. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 7, it is characterized in that, it is described
Bagging algorithms are a kind of Ensemble Learning Algorithms, give some weak learning algorithm and training set sample T={ (x1,y1),(x2,
y2),......(xn,yn), the extraction sample put back to is carried out to it, generates a number in each fundamental classifier afterwards
Training subset identical with original training set in amount but different, different basic classification device can be trained afterwards.
9. a kind of discovery method of the electricity stealing based on random forest as claimed in claim 8, it is characterized in that, it is described
Bagging algorithm particular contents are as follows:
Assuming that initial data concentrate, total sample number n, therefrom randomly, independently, with putting back to extract m data m≤
N, form brand-new self-service training dataset;
Said process is repeated, forms multiple separate self-service training datasets;
By each separate self-service training dataset, the separate sub-classifier of identical quantity is trained;
The differentiation result of final algorithm, multiple separate sub-classifiers are respective more than differentiates that result is voted certainly
It is fixed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711260280.XA CN107862347A (en) | 2017-12-04 | 2017-12-04 | A kind of discovery method of the electricity stealing based on random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711260280.XA CN107862347A (en) | 2017-12-04 | 2017-12-04 | A kind of discovery method of the electricity stealing based on random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107862347A true CN107862347A (en) | 2018-03-30 |
Family
ID=61704996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711260280.XA Pending CN107862347A (en) | 2017-12-04 | 2017-12-04 | A kind of discovery method of the electricity stealing based on random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107862347A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410074A (en) * | 2018-10-18 | 2019-03-01 | 广州市勤思网络科技有限公司 | Intelligent core protects method and system |
CN109657705A (en) * | 2018-12-03 | 2019-04-19 | 国网天津市电力公司电力科学研究院 | A kind of automobile user clustering method and device based on random forests algorithm |
CN109739846A (en) * | 2018-12-27 | 2019-05-10 | 国电南瑞科技股份有限公司 | A kind of electric network data mass analysis method |
CN109858886A (en) * | 2019-02-18 | 2019-06-07 | 国网吉林省电力有限公司电力科学研究院 | It is a kind of that control success rate promotion analysis method is taken based on integrated study |
CN111210269A (en) * | 2020-01-02 | 2020-05-29 | 平安科技(深圳)有限公司 | Object identification method based on big data, electronic device and storage medium |
CN111428804A (en) * | 2020-04-01 | 2020-07-17 | 广东电网有限责任公司 | Random forest electricity stealing user detection method with optimized weighting |
CN111428808A (en) * | 2020-04-08 | 2020-07-17 | 成都爱科特科技发展有限公司 | Method for classifying services by using random forest |
CN111753907A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Method, device, equipment and storage medium for processing electric quantity data |
CN111861786A (en) * | 2020-06-12 | 2020-10-30 | 国网浙江省电力有限公司电力科学研究院 | Special transformer electricity stealing identification method based on feature selection and isolated random forest |
CN112101635A (en) * | 2020-08-25 | 2020-12-18 | 南方电网深圳数字电网研究院有限公司 | Method and system for monitoring electricity utilization abnormity |
CN112329895A (en) * | 2021-01-05 | 2021-02-05 | 国网江西综合能源服务有限公司 | Method and device for identifying user with suspicion of electricity stealing |
CN112801145A (en) * | 2021-01-12 | 2021-05-14 | 深圳市中博科创信息技术有限公司 | Safety monitoring method and device, computer equipment and storage medium |
CN113128567A (en) * | 2021-03-25 | 2021-07-16 | 云南电网有限责任公司 | Abnormal electricity consumption behavior identification method based on electricity consumption data |
CN113282613A (en) * | 2021-04-16 | 2021-08-20 | 广东电网有限责任公司计量中心 | Method, system, equipment and storage medium for analyzing power consumption of specific transformer and low-voltage user |
CN113362118A (en) * | 2021-07-08 | 2021-09-07 | 广东电网有限责任公司 | User electricity consumption behavior analysis method and system based on random forest |
CN113946720A (en) * | 2020-07-17 | 2022-01-18 | 中国移动通信集团广东有限公司 | Method and device for identifying users in group and electronic equipment |
CN115032720A (en) * | 2022-07-15 | 2022-09-09 | 国网上海市电力公司 | Application of multi-mode integrated forecast based on random forest in ground air temperature forecast |
CN116595463A (en) * | 2023-07-18 | 2023-08-15 | 国网山东省电力公司武城县供电公司 | Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488867A (en) * | 2013-07-16 | 2014-01-01 | 深圳市航天泰瑞捷电子有限公司 | Method for automatically screening abnormal electricity consumption user |
CN105205531A (en) * | 2014-06-30 | 2015-12-30 | 国家电网公司 | Anti-electric-larceny prediction method based on machine learning and apparatus thereof |
CN105573997A (en) * | 2014-10-09 | 2016-05-11 | 普华讯光(北京)科技有限公司 | Method and device for determining electric larceny suspect user |
CN106019087A (en) * | 2016-07-20 | 2016-10-12 | 国网上海市电力公司 | Intermittent electricity stealing monitoring system |
CN106909933A (en) * | 2017-01-18 | 2017-06-30 | 南京邮电大学 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
-
2017
- 2017-12-04 CN CN201711260280.XA patent/CN107862347A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488867A (en) * | 2013-07-16 | 2014-01-01 | 深圳市航天泰瑞捷电子有限公司 | Method for automatically screening abnormal electricity consumption user |
CN105205531A (en) * | 2014-06-30 | 2015-12-30 | 国家电网公司 | Anti-electric-larceny prediction method based on machine learning and apparatus thereof |
CN105573997A (en) * | 2014-10-09 | 2016-05-11 | 普华讯光(北京)科技有限公司 | Method and device for determining electric larceny suspect user |
CN106019087A (en) * | 2016-07-20 | 2016-10-12 | 国网上海市电力公司 | Intermittent electricity stealing monitoring system |
CN106909933A (en) * | 2017-01-18 | 2017-06-30 | 南京邮电大学 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Non-Patent Citations (4)
Title |
---|
李贞贵: "随机森林改进的若干研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
汪嵘明等: "《商务经济统计》", 31 July 2016 * |
许智等: "基于机器学习的用户窃电行为预测", 《上海电力学院学报》 * |
陈晶晶等: "基于随机森林的用电行为分析", 《上海电力学院学报》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410074A (en) * | 2018-10-18 | 2019-03-01 | 广州市勤思网络科技有限公司 | Intelligent core protects method and system |
CN109657705A (en) * | 2018-12-03 | 2019-04-19 | 国网天津市电力公司电力科学研究院 | A kind of automobile user clustering method and device based on random forests algorithm |
CN109739846A (en) * | 2018-12-27 | 2019-05-10 | 国电南瑞科技股份有限公司 | A kind of electric network data mass analysis method |
CN109858886A (en) * | 2019-02-18 | 2019-06-07 | 国网吉林省电力有限公司电力科学研究院 | It is a kind of that control success rate promotion analysis method is taken based on integrated study |
CN109858886B (en) * | 2019-02-18 | 2021-03-19 | 国网吉林省电力有限公司电力科学研究院 | Integrated learning-based cost control success rate promotion analysis method |
CN111210269A (en) * | 2020-01-02 | 2020-05-29 | 平安科技(深圳)有限公司 | Object identification method based on big data, electronic device and storage medium |
CN111428804A (en) * | 2020-04-01 | 2020-07-17 | 广东电网有限责任公司 | Random forest electricity stealing user detection method with optimized weighting |
CN111428808A (en) * | 2020-04-08 | 2020-07-17 | 成都爱科特科技发展有限公司 | Method for classifying services by using random forest |
CN111861786A (en) * | 2020-06-12 | 2020-10-30 | 国网浙江省电力有限公司电力科学研究院 | Special transformer electricity stealing identification method based on feature selection and isolated random forest |
CN111861786B (en) * | 2020-06-12 | 2024-07-12 | 国网浙江省电力有限公司营销服务中心 | Special power-stealing identification method based on feature selection and isolated random forest |
CN111753907A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Method, device, equipment and storage medium for processing electric quantity data |
CN113946720A (en) * | 2020-07-17 | 2022-01-18 | 中国移动通信集团广东有限公司 | Method and device for identifying users in group and electronic equipment |
CN112101635A (en) * | 2020-08-25 | 2020-12-18 | 南方电网深圳数字电网研究院有限公司 | Method and system for monitoring electricity utilization abnormity |
CN112329895A (en) * | 2021-01-05 | 2021-02-05 | 国网江西综合能源服务有限公司 | Method and device for identifying user with suspicion of electricity stealing |
CN112801145B (en) * | 2021-01-12 | 2024-05-28 | 深圳市中博科创信息技术有限公司 | Security monitoring method, device, computer equipment and storage medium |
CN112801145A (en) * | 2021-01-12 | 2021-05-14 | 深圳市中博科创信息技术有限公司 | Safety monitoring method and device, computer equipment and storage medium |
CN113128567A (en) * | 2021-03-25 | 2021-07-16 | 云南电网有限责任公司 | Abnormal electricity consumption behavior identification method based on electricity consumption data |
CN113282613A (en) * | 2021-04-16 | 2021-08-20 | 广东电网有限责任公司计量中心 | Method, system, equipment and storage medium for analyzing power consumption of specific transformer and low-voltage user |
CN113282613B (en) * | 2021-04-16 | 2023-05-26 | 广东电网有限责任公司计量中心 | Method, system, equipment and storage medium for analyzing power consumption of private transformer and low-voltage user |
CN113362118A (en) * | 2021-07-08 | 2021-09-07 | 广东电网有限责任公司 | User electricity consumption behavior analysis method and system based on random forest |
CN115032720A (en) * | 2022-07-15 | 2022-09-09 | 国网上海市电力公司 | Application of multi-mode integrated forecast based on random forest in ground air temperature forecast |
CN116595463A (en) * | 2023-07-18 | 2023-08-15 | 国网山东省电力公司武城县供电公司 | Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device |
CN116595463B (en) * | 2023-07-18 | 2023-09-19 | 国网山东省电力公司武城县供电公司 | Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107862347A (en) | A kind of discovery method of the electricity stealing based on random forest | |
Li et al. | Adaptive multi-objective swarm fusion for imbalanced data classification | |
CN107766883A (en) | A kind of optimization random forest classification method and system based on weighted decision tree | |
Xia et al. | Detection methods in smart meters for electricity thefts: A survey | |
CN106845717B (en) | Energy efficiency evaluation method based on multi-model fusion strategy | |
CN107194803A (en) | P2P net loan borrower credit risk assessment device | |
Oprea et al. | Machine learning classification algorithms and anomaly detection in conventional meters and Tunisian electricity consumption large datasets | |
CN102324038B (en) | Plant species identification method based on digital image | |
CN107507038A (en) | A kind of electricity charge sensitive users analysis method based on stacking and bagging algorithms | |
CN109739844B (en) | Data classification method based on attenuation weight | |
CN108960833A (en) | A kind of abnormal transaction identification method based on isomery finance feature, equipment and storage medium | |
CN101147160B (en) | Adaptive classifier, and method of creation of classification parameters therefor | |
CN103632168A (en) | Classifier integration method for machine learning | |
CN109886284B (en) | Fraud detection method and system based on hierarchical clustering | |
CN110998608A (en) | Machine learning system for various computer applications | |
CN105740912A (en) | Nuclear norm regularization based low-rank image characteristic extraction identification method and system | |
CN112001788B (en) | Credit card illegal fraud identification method based on RF-DBSCAN algorithm | |
CN106845846A (en) | Big data asset evaluation method | |
Xu et al. | Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode | |
Sudha et al. | Credit card fraud detection system based on operational & transaction features using svm and random forest classifiers | |
Zhang et al. | Research on borrower's credit classification of P2P network loan based on LightGBM algorithm | |
CN110363384A (en) | Exception electric detection method based on depth weighted neural network | |
Zhou et al. | Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm | |
CN117408699A (en) | Telecom fraud recognition method based on bank card data | |
Mishra et al. | Improving the efficacy of clustering by using far enhanced clustering algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180330 |
|
RJ01 | Rejection of invention patent application after publication |