CN108388974A - Top-tier customer Optimum Identification Method and device based on random forest and decision tree - Google Patents

Top-tier customer Optimum Identification Method and device based on random forest and decision tree Download PDF

Info

Publication number
CN108388974A
CN108388974A CN201810028008.7A CN201810028008A CN108388974A CN 108388974 A CN108388974 A CN 108388974A CN 201810028008 A CN201810028008 A CN 201810028008A CN 108388974 A CN108388974 A CN 108388974A
Authority
CN
China
Prior art keywords
customer
sample
data
tier
tier customer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810028008.7A
Other languages
Chinese (zh)
Inventor
李云亭
张洪利
荣以平
朱伟义
刘霄慧
尹明立
粱波
姜云
王伟
刘昳娟
王鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shandong Electric Power Co Ltd filed Critical State Grid Shandong Electric Power Co Ltd
Priority to CN201810028008.7A priority Critical patent/CN108388974A/en
Publication of CN108388974A publication Critical patent/CN108388974A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of top-tier customer Optimum Identification Method and device based on random forest and decision tree, the described method comprises the following steps:Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;It using sample customer data, is trained based on random forest and decision Tree algorithms, builds top-tier customer identification model;Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and top-tier customer Statistical error model is trained based on analysis result;Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, judges whether the client is top-tier customer.The present invention is based on the precise positionings that big data realizes top-tier customer.

Description

Top-tier customer Optimum Identification Method and device based on random forest and decision tree
Technical field
The invention belongs to the technical fields of machine learning more particularly to a kind of high-quality based on random forest and decision tree Client's Optimum Identification Method and device.
Background technology
With electric Power Reform in-depth, comprehensive relieving of sales market, electric companies at different levels of State Grid Corporation of China face The market competitive pressure, to promote power grid enterprises' profitability and competitiveness, increase the loyalty of top-tier customer, satisfaction and Client's stickiness, on the basis of carrying out whole society's universal service, it will be each sale of electricity master to provide good service for top-tier customer for enterprise Body competes the main means and strategy of top-tier customer.
To achieve the goals above, the long-term sustainable of power grid enterprises is kept to develop, it is necessary to precise positioning top-tier customer, And differentiation good service is provided for top-tier customer, enhancing client seizes high-quality the loyalty and dependence viscosity of power grid enterprises Customer resources market formulates targetedly competitive service strategy, limited Service Source is put into the body of top-tier customer On, it is established with it stable for electricity consumption relationship.
With company's marketing informationization, Automation Construction carry forward vigorously and 400,000,000 user's intelligent electric energy meters are comprehensively real Existing automatic information collecting, the magnanimity customer data that company possesses, the data year such as archives, industry expansion, metering, electricity charge of 4.3 hundred million clients Increment about 50TB, 4.0 hundred million intelligent electric energy meter electricity, power quality acquisition bring data annual increment about 500TB, daily About 200,000 times 95598 phones bring data annual increment about 10TB, and about 150,000 business work orders bring data annual increment daily About 2TB, energy conservation service, electric vehicle charging and conversion electric network operation etc. are also accumulated from mass data;Meanwhile with internet economy Fast development, the network users' such as 95598 intelligent interaction websites, " palm electric power " cell phone application, " electric e precious ", " e chargings " answers With in explosive growth.
However, with the explosive growth of data volume and the continuous improvement of business need, traditional service system structure is Through being increasingly difficult to meet the requirement of system operation.Big data technology has reached in the world as important strategic resource At common recognition, this basic strategic resource of data is analysis customer demand and provides pertinent service, provides data supporting.
In conclusion how to realize the accurate identification positioning of top-tier customer based on big data, being that current needs are urgent solves The technical issues of.
Invention content
To overcome above-mentioned the deficiencies in the prior art, solve to be directed in the prior art how based on the high-quality visitor of big data realization The problem of accurate identification positioning at family, the present invention provides a kind of sale of electricity side groups in the top-tier customer of random forest and decision tree Optimum Identification Method and device, the method with grid company client electrical properties, electricity consumption behavior, with magnanimity such as electrical features Based on data, the customer evaluation index system of various dimensions is established, passes through the customer evaluation mould built in a manner of data analysis Type, and model is advanced optimized, comprehensive score is carried out to client, to realize the precise positioning to top-tier customer.
The first object of the present invention is to provide a kind of top-tier customer Statistical error side based on random forest and decision tree Method.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of top-tier customer Optimum Identification Method based on random forest and decision tree, includes the following steps:
Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;
It using sample customer data, is trained based on random forest and decision Tree algorithms, structure top-tier customer identifies mould Type;
Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and based on analysis As a result top-tier customer Statistical error model is trained;
Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, described in judgement Whether client is top-tier customer.
Scheme as a further preference, the quality for obtaining sample customer value feature and carrying out sample client The specific steps of differentiation include:
Select sample client, obtain sample client user's items power information, and it is influenced customer general value into Row analysis, builds the customer value evaluating characteristic index system of various dimensions;
According to the value characteristic of the index system statistical sample client, and carry out the differentiation of sample client's quality.
Scheme as a further preference, value characteristic includes that user corresponds in the acquisition sample customer value feature Essential attribute, economic value, Laden-Value, dynamogenetic value, credit worthiness, industry be worth data.
Scheme as a further preference, it is described to use sample customer data, based on random forest and decision Tree algorithms into Row training, the specific steps for building top-tier customer identification model include:
Sample customer data is pre-processed;
Top-tier customer judgment models are built based on random forest method;
Top-tier customer business rule paraphrase model is built based on decision Tree algorithms;
Using pretreated sample customer data to top-tier customer judgment models and top-tier customer business rule paraphrase mould Type carries out model training, builds top-tier customer identification model.
Scheme as a further preference, it is described to include to the pretreated specific steps of sample customer data progress:Data Cleaning, characteristic factor quantization, feature expansion, feature selecting and outlier processing.
Scheme as a further preference, the data cleansing are by the inspection of data over run value, feature validation test It is examined with data null value, data is cleaned;
The data over run value verify as checking electricity consumption and electricity charge electricity price in sample customer data be 0 record simultaneously It is deleted;The feature validation test is to check that user's importance characteristic information is excessively single in sample customer data Record;The data null value verifies as checking that the complete overdue number of days of empty and electricity charge returned money of pause day digital section lacks serious record.
Scheme as a further preference, it is described that mould is judged to top-tier customer using pretreated sample customer data The specific steps that type and top-tier customer business rule paraphrase model carry out model training include:Full feature training, again is carried out successively Want feature training, the training of full characteristic crossover and important feature cross-training;
The full feature training:Sample chooses whole sample customer datas, and model enters ginseng for whole operational indicators;
The important feature training:Sample chooses whole sample customer datas, and it is high preceding 40% of importance that model, which enters ginseng, Index;
The full characteristic crossover training:Sample customer data is averagely split into 10 parts, every time select wherein 9 parts as Training sample, remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
The important feature cross-training:Sample customer data is averagely split into 10 parts, every time wherein 9 parts works of selection For training sample, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
Scheme as a further preference, before model training, the method further includes:Using MDA methods and MDG method phases In conjunction with mode choose importance index, by model training, obtain index importance analysis result.
Scheme as a further preference, the method further include:Establish the top-tier customer identification model upgrading optimization Permanent mechanism, efficiency analysis is aperiodically carried out to the judging result of top-tier customer identification model based on supervising professional method, And it is based on analysis result, re -training top-tier customer Statistical error model.
Scheme as a further preference, the method further include:To the trained top-tier customer Statistical error mould Type is integrated, and is collected user characteristic data by data-interface, the identification of top-tier customer is periodically carried out, by client to be identified Value characteristic as input, be based on the top-tier customer Statistical error model, judge whether the client is top-tier customer.
The second object of the present invention is to provide a kind of based on random forest and the top-tier customer Statistical error of decision tree dress It sets.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of top-tier customer Statistical error device based on random forest and decision tree, including memory, processor and deposit The computer program that can be run on a memory and on a processor is stored up, when the processor executes described program described in realization A kind of top-tier customer Optimum Identification Method based on random forest and decision tree.
The third object of the present invention is to provide a kind of computer readable storage medium.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of computer readable storage medium, is stored thereon with computer program, which executes when being executed by processor A kind of top-tier customer Optimum Identification Method based on random forest and decision tree.
Beneficial effects of the present invention
1, the present invention by grid company client electrical properties, electricity consumption behavior, with based on the mass datas such as electrical feature, Using the technological means of machine learning, the identification of top-tier customer is realized, providing good service to be directed to top-tier customer provides It ensures, helps to promote power grid enterprises' competitiveness.
2, the present invention carries out the training of client's identification model in such a way that random forest and decision tree are combined, described Identification model can judge the business rule paraphrase of top-tier customer, further realize on the basis of identifying whether client is good The precise positioning of top-tier customer.
3, the present invention establishes the permanent mechanism of the top-tier customer identification model upgrading optimization, is based on supervising professional method pair The judging result of top-tier customer identification model aperiodically carries out efficiency analysis, and is based on analysis result, and re -training is high-quality Client's Statistical error model achievees the purpose that model version upgrading and optimization by re -training model.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, the application's Illustrative embodiments and their description do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is that the present invention is based on the top-tier customer recognition methods flow charts of big data;
Fig. 2 is that the present invention is based on decision Tree algorithms to build top-tier customer business rule paraphrase illustraton of model.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless Otherwise indicated, all technical and scientific terms that the present embodiment uses have the ordinary skill with the application technical field The normally understood identical meanings of personnel.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular shape Formula is also intended to include plural form, additionally, it should be understood that, when in the present specification use term "comprising" and/or When " comprising ", existing characteristics, step, operation, device, component and/or combination thereof are indicated.
It should be noted that flowcharts and block diagrams in the drawings show methods according to various embodiments of the present disclosure With the architecture, function and operation in the cards of system.It should be noted that each box in flowchart or block diagram can be with A part for a module, program segment, or code is represented, a part for the module, program segment, or code may include one A or multiple executable instructions for realizing the logic function of defined in each embodiment.It should also be noted that in some works Function in alternative realization, to be marked in box can also occur according to the sequence different from being marked in attached drawing.Example Such as, two boxes succeedingly indicated can essentially be basically executed in parallel or they sometimes can also be according to opposite Sequence executes, this depends on involved function.It should also be noted that each box in flowchart and or block diagram, And the combination of the box in flowchart and or block diagram, the dedicated based on hard of functions or operations as defined in executing can be used The system of part is realized, or can make to combine using a combination of dedicated hardware and computer instructions to realize.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.It ties below Closing attached drawing, the invention will be further described with embodiment.
Embodiment one
Present embodiment discloses a kind of top-tier customer Optimum Identification Method based on random forest and decision tree, such as Fig. 1 institutes Show, includes the following steps:
Step (1):Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;
Step (2):It using sample customer data, is trained based on random forest and decision Tree algorithms, builds high-quality visitor Family identification model;
Step (3):Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and Top-tier customer Statistical error model is trained based on analysis result;
Step (4):Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, Judge whether the client is top-tier customer.
Step (1) is data preparation stage
1, customer value evaluating characteristic index system is established:
The archive information of collection grid company client, economic value category information, Laden-Value category information, dynamogenetic value class Information, credit worthiness category information, industry are worth category information, the various factors for influencing customer general value of comprehensive analysis, with power grid Corporate client's uses electrical properties, electricity consumption behavior, with based on the mass datas such as electrical feature, and the customer value for establishing various dimensions is commented Valence characteristic index system.
Discussion and customer surveys are concentrated by client, realizes that prefectures and cities' sample client's quality differentiates, is carried for model training For data basis.
It is based on the various value characteristics that grid company is brought according to top-tier customer, the every electricity consumption for combing client refers to Mark sorts out index according to customer value angle, builds customer evaluation index system, to criterionization processing, goes forward side by side Row various dimensions summarize, to judge that the high-quality characteristic of client provides data basis.
2, model training sample is determined:
By the top-tier customer index system determined with districts and cities expert discussions, based on sales service application system, telecommunications is used Acquisition system is ceased, counts the corresponding essential attribute of sample client, economic value, Laden-Value, dynamogenetic value, credit respectively Value, industry are worth data, in this, as model training sample.It is special to 47.4 ten thousand sample customer electricity behaviors in the present embodiment Whether sign data have carried out expert judging, be labelled with high-quality.
User property:Family number, name in an account book, trade classification, whether highly energy-consuming and electricity consumption classification.
Economic value:Customer electricity to situation of getting a profit caused by power supply enterprise, as average electric sales rate is higher, electricity consumption compared with Greatly, the more client of the electricity charge.Including:Current average electric sales rate, the current electricity charge, current electricity, accumulative average electric sales rate, the accumulative electricity charge, Accumulative electricity, contract capacity and working capacity.
Laden-Value:The electric load value that client shows during electricity consumption, as power factor (PF) is larger, average The high and low preferable client of paddy power consumption rate of rate of load condensate.Including:Average daily load rate, Peak power use rate, valley power consumption rate and power tune Coefficient.
Dynamogenetic value:Client itself electricity consumption development is preferable, and future contributes larger client, can bring and continue to company Profit contribution.Including:Current electricity growth rate, nearly 3 months electricity growth rates, nearly 6 months electricity growth rates, nearly 1 year electricity Measure growth rate, increase-volume number and volume reduction number.
Credit worthiness:Credit is that the basic guarantee of transaction is completed for electricity consumption both sides, can use electricity in accordance with the law, pay electricity on time The client taken.Including:The advance rate of carrying down of the electricity charge, the overdue number of days of electricity charge returned money, the overdue number of electricity charge returned money, electricity charge returned money phase, branch Ticket returned ticket number and promise breaking stealing number.
Industry is worth:Consider that the industry development foreground of client, the development of industry entirety electricity consumption level are preferable.Including:Industry Electricity growth rate, industry major class electricity growth rate and industry group electricity growth rate.
In data preparation stage, the standard formulation work that client to be identified supervises source is also carried out, that is, has been formulated as effective The business that should meet of supervision source, and carry out preliminary screening to client to be identified, only produced in the business The supervision source gone out, we are just considered effective client to be identified.
Expert judging has been carried out to 47.4 ten thousand sample customer electricity behavioural characteristic data in the present embodiment, be labelled with whether It is high-quality.
Step (2) is data processing stage and model training stage.
Step (2-1):Data processing stage
Current database is easily invaded and harassed by noise, loss data and inconsistent data, and quantity is too big, and is come mostly From multiple heterogeneous data sources, cause the quality of data relatively low, low-quality data will cause the result of data analysis inaccurate, because This needs to carry out data prediction before model training.The data prediction of this programme mainly quantifies from characteristic factor, is different Constant value processing, continuous variable processing etc. expansion.
1, data cleansing
It is examined by the inspection of data over run value, feature validation test, data null value, data is cleaned.
It transfinites inspection:Check that electricity consumption and electricity charge electricity price are 0 record and are deleted, electricity consumption and electricity charge electricity price It is that 0 expression user need not be electric, i.e., does not produce, other related features also do not have characteristic.
Characteristic validity inspection:Check that the excessively single record of user's importance characteristic information, only minority belong to important User.
Null value inspection:Check that the complete overdue number of days of empty and electricity charge returned money of pause day digital section lacks serious record.Suspend day The complete empty expression pause full user of number of days of digital section lacks;It checks the overdue number of days of electricity charge returned money, it is found that field record is sky, but Specific business is not overdue.
2, characteristic factor quantifies
The information such as files on each of customers, festivals or holidays and weather come from marketing system or other systems acquisition be all with word or What code name indicated, it needs to carry out numeralization expression to this class variable.
42 field spies such as name in an account book, family number, industry, industry group, industry major class, highly energy-consuming trade, importance rate Sign.It is classified as follows:1) customer attribute information;2) economic value;3) Laden-Value;4) dynamogenetic value;5) credit worthiness;6) row Industry is worth.
Factorization is converted:(being expressed using 0/1/2/3... digital codings) industry, industry group, industry major class, high consumption It can industry, importance rate, electricity consumption classification, voltage class, region, scale of investment, the size of capacity, load character;
3, feature is expanded:
1) normalization is expanded:(setting within [0-1] user data value to data as feature) electricity charge, contract capacity, Nearly annual electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
2) discretization is expanded:(be segmented user data value by size and be used as feature) electricity charge, are put down at contract capacity for nearly 1 year Equal electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
3) sequencing feature is expanded:It is (sorting by size user data value as feature) electricity charge, contract capacity, 1 year nearly Average electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
4) few data encoding is measured to expand:(codings of onehot 0/1) increase-volume number, volume reduction number, the old deficient electricity charge, Chen Qian electricity Take accounting, promise breaking stealing number.
4, feature selecting:
For user property feature, the distributing equilibrium situation of data is observed, whether these dimensional characteristics of preliminary analysis are to excellent The influence of matter and requirement item.
For 5 class value characteristics, the distributing equilibrium situation of data is observed, whether these dimensional characteristics of preliminary analysis are to excellent The influence of matter and requirement item.It checks whether with associate feature.
Comprehensive dimensionality reduction, explores a variety of methods of attempting, and the result of comprehensive various methods carries out dimensionality reduction.
5, outlier processing
Gathered data, which exists, not to be acquired or the case where abnormal data, archives class data the case where there is also missings, needs needle Missing values processing is carried out to this partial data, different missing values processing methods is selected according to different business rule:
Default value is replaced:For in certain archives as load character, voltage class the case where, by universal business rule Default settings is calculated.
Case scalping method:If missing values proportion is fewer, and certain attribute is important, then uses case Scalping method weeds out the data.If such as user id loses in User Profile information, directly weeds out the data.
Mean value Shift Method:If missing values are value types, the number of missing is filled with the average value of front and back data According to.
If missing values are non-numeric types, the data that are lacked come polishing with the mode of the attribute.
Calorie completion method:An object most like with missing data object is selected in data set, with the value of the object Instead of missing values.
Step (2-2):Model training stage
The present embodiment carries out model training using random forest and decision tree method.
1, it is based on random forest method and trains top-tier customer judgment models
Importance index is chosen
Importance index selection is carried out using following two methods:One is the methods based on OOB errors, referred to as MDA (Mean Decrease Accuracy);Another kind is the method based on Gini impurity levels, referred to as MDG (Mean Decrease Gini).Both of which is that the bigger expression variable of scalar value is more important.By model training, index importance analysis knot is obtained Two methods of fruit, the importance index that comparison obtain, table specific as follows:
Table 1
Ranking MDA MDG
1 Accumulative electricity Accumulative electricity
2 The accumulative electricity charge The accumulative electricity charge
3 The current electricity charge The current electricity charge
4 Current electricity Current electricity
5 Working capacity Working capacity
6 It dishonours a cheque number Power tune coefficient
7 Accumulative average electric sales rate Industry major class electricity growth rate
8 Industry major class electricity growth rate Annual daily load rate
9 Power tune coefficient The electricity charge returned money phase
10 Accumulative electricity price growth rate Industry group electricity growth rate
In conjunction with the above importance index, determine that 13 indexs are importance index, it is specific as follows:
Table 2
Serial number Importance index Corresponding data arranges
1 Accumulative electricity 7
2 The accumulative electricity charge 8
3 The current electricity charge 5
4 Current electricity 4
5 Working capacity 10
6 Power tune coefficient 15
7 It dishonours a cheque number 35
8 Accumulative average electric sales rate 9
9 Industry major class electricity growth rate 39
10 Accumulative electricity price growth rate 24
11 Annual daily load rate 11
12 The electricity charge returned money phase 34
13 Industry group electricity growth rate 38
Training data is trained and is optimized by random forest method, whether finds out electricity consumption behavioural characteristic value and user Correspondence between high-quality, generation judge the whether good model of client.
In the present embodiment, using following training process, implementation model gradually adjusts, from model stability and accuracy Two dimensions carry out model validation analysis, and implementation model gradually adjusts.Specific training process is as follows:
Full feature training:Sample chooses all 47.4 ten thousand families, and model enters ginseng for whole operational indicators;
Important feature is trained:Sample chooses all 47.4 ten thousand families, and it is high preceding 40% index of importance that model, which enters ginseng,;
Full characteristic crossover training:Whole sample means are split into 10 parts, select every time wherein 9 parts as training sample, Remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
Important feature cross-training:Whole sample means are split into 10 parts, select every time wherein 9 parts as trained sample This, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
Wherein, noise identification is carried out by the notable property coefficient p of analysis model input variable, noise variance will not be included in Model.
The present embodiment amounts to 47.4 ten thousand datas of collection and weeds out 3.94 ten thousand sample clients by data cleansing.Mould Type training process is total to apply 43.5 samples, wherein 10.06 ten thousand families are top-tier customer, 33.39 ten thousand families are non-prime client, The high-quality ratio 0.3 to 1 with non-prime sample.
2, top-tier customer business rule paraphrase model is built using decision Tree algorithms, as shown in Figure 2;
Random Forest model training result is good, and master cast, but the paraphrase of the random forest method are identified as top-tier customer Property is poor;And decision tree can then provide the business rule paraphrase for judging top-tier customer, using decision Tree algorithms as top-tier customer Identify submodel.
47.4 ten thousand sample clients of decision Tree algorithms pair carry out various combinations and judge, eventually obtained most to each branch Excellent evaluation criterion, ideally, the final result of each branch should be all high-quality or non-prime, and described below two Group of branches analytic explanation:
I, decision-tree model is obtained when power tune coefficient≤- 0.001 and current electricity charge > 40235.03 and accumulative average electric sales rate ≤ 1.12, the probability family for having 98.2% is high-quality user;It is the Criterion Attribute by being provided at present, algorithm is to this branch User can not further refine, so can not be 100% whether judge high-quality;
II, decision-tree model obtain when power tune coefficient > -0.001 and the current electricity charge≤3566.49 and the accumulative electricity charge≤ When 481560, the probability for having 100% is non-prime user.
Step (3):Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, is judged Whether the client is top-tier customer.
Step (3) is model iteration optimization and modelling effect evaluation stage
In the model iteration optimization stage, the permanent mechanism of modeler model edition upgrading optimization.Carry out mould by supervising professional Type judges that result is corrected, and irregularly efficiency analysis is carried out to model judgement result, on the basis of analysis result, by again Training pattern achievees the purpose that model version upgrading and optimization.
Accuracy rate, the recall rate of best model are carried out with the data of expert estimation in modelling effect evaluation stage It examines, assessment models effect.
Step (4) is that model disposes the application stage
The trained top-tier customer Statistical error model is integrated, user characteristics are collected by data-interface Data periodically carry out the identification of top-tier customer, using the value characteristic of client to be identified as input, are based on the top-tier customer Statistical error model judges whether the client is top-tier customer.
Embodiment two
The purpose of the present embodiment is to provide a kind of top-tier customer Statistical error device based on random forest and decision tree.
A kind of top-tier customer Statistical error device based on random forest and decision tree, including memory, processor and deposit The computer program that can be run on a memory and on a processor is stored up, the processor is realized following when executing described program Step, including:
Step (1):Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;
Step (2):It using sample customer data, is trained based on random forest and decision Tree algorithms, builds high-quality visitor Family identification model;
Step (3):Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and Top-tier customer Statistical error model is trained based on analysis result;
Step (4):Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, Judge whether the client is top-tier customer.
Embodiment three
The purpose of the present embodiment is to provide a kind of computer readable storage medium.
A kind of computer readable storage medium, is stored thereon with computer program, which executes when being executed by processor Following steps:
Step (1):Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;
Step (2):It using sample customer data, is trained based on random forest and decision Tree algorithms, builds high-quality visitor Family identification model;
Step (3):Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and Top-tier customer Statistical error model is trained based on analysis result;
Step (4):Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, Judge whether the client is top-tier customer.
Each step involved in the device of above example two and three is corresponding with embodiment of the method one, specific embodiment party Formula can be found in the related description part of embodiment one.Term " computer readable storage medium " be construed as include one or The single medium or multiple media of multiple instruction collection;Any medium is should also be understood as including, any medium can be deposited Storage, coding carry the instruction set for being executed by processor and processor are made to execute the either method in the present invention.
In the present embodiment, computer program product may include computer readable storage medium, containing for holding The computer-readable program instructions of row various aspects of the disclosure.Computer readable storage medium can be kept and deposit Store up the tangible device of the instruction used by instruction execution equipment.Computer readable storage medium for example can be-- but it is unlimited In-- storage device electric, magnetic storage apparatus, light storage device, electromagnetism storage device, semiconductor memory apparatus or above-mentioned Any appropriate combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Portable computing Machine disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or Flash memory), static RAM (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure, with And above-mentioned any appropriate combination.Computer readable storage medium used herein above is not interpreted instantaneous signal sheet The electromagnetic wave of body, such as radio wave or other Free propagations, the electromagnetic wave propagated by waveguide or other transmission mediums (for example, the light pulse for passing through fiber optic cables) or the electric signal transmitted by electric wire.
Computer-readable program instructions described herein can download to each meter from computer readable storage medium Calculation/processing equipment, or outer computer is downloaded to by network, such as internet, LAN, wide area network and/or wireless network Or External memory equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, fire wall, exchange Machine, gateway computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are from net Network receives computer-readable program instructions, and forwards the computer-readable program instructions, for being stored in each calculating/processing In computer readable storage medium in equipment.
Computer program instructions for executing present disclosure operation can be assembly instruction, instruction set architecture (ISA) Instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programmings Language arbitrarily combines the source code or object code write, and the programming language includes that the programming language-of object-oriented is all Such as C++, and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program Instruction can be executed fully, partly be executed on the user computer, as an independent software on the user computer Packet executes, part executes or on the remote computer completely in remote computer or server on the user computer for part Upper execution.In situations involving remote computers, remote computer can include LAN by the network-of any kind (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as using because of spy Service provider is netted to be connected by internet).In some embodiments, by using the shape of computer-readable program instructions State information comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable Logic array (PLA), the electronic circuit can execute computer-readable program instructions, to realize each of present disclosure Aspect.
Beneficial effects of the present invention
1, the present invention by grid company client electrical properties, electricity consumption behavior, with based on the mass datas such as electrical feature, Using the technological means of machine learning, the identification of top-tier customer is realized, providing good service to be directed to top-tier customer provides It ensures, helps to promote power grid enterprises' competitiveness.
2, the present invention carries out the training of client's identification model in such a way that random forest and decision tree are combined, described Identification model can judge the business rule paraphrase of top-tier customer, further realize on the basis of identifying whether client is good The precise positioning of top-tier customer.
3, the present invention establishes the permanent mechanism of the top-tier customer identification model upgrading optimization, is based on supervising professional method pair The judging result of top-tier customer identification model aperiodically carries out efficiency analysis, and is based on analysis result, and re -training is high-quality Client's Statistical error model achievees the purpose that model version upgrading and optimization by re -training model.
It will be understood by those skilled in the art that each module or each step of aforementioned present invention can use general computer Device realizes that optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are deposited Storage be performed by computing device in the storage device, either they are fabricated to each integrated circuit modules or by it In multiple modules or step be fabricated to single integrated circuit module to realize.The present invention is not limited to any specific hard The combination of part and software.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, is not protected to the present invention The limitation of range, those skilled in the art should understand that, based on the technical solutions of the present invention, people in the art Member need not make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (10)

1. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree, which is characterized in that include the following steps:
Sample customer value feature is obtained, and the quality for carrying out sample client differentiates;
It using sample customer data, is trained based on random forest and decision Tree algorithms, builds top-tier customer identification model;
Efficiency analysis is carried out to the judging result of top-tier customer identification model based on supervising professional method, and is instructed based on analysis result Practice top-tier customer Statistical error model;
Using the value characteristic of client to be identified as input, it is based on the top-tier customer Statistical error model, judges the client Whether it is top-tier customer.
2. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as described in claim 1, feature It is, the specific steps that the quality for obtaining sample customer value feature and carrying out sample client differentiates include:
Sample client is selected, user's items power information of sample client is obtained, and customer general value is influenced on it and is divided Analysis, builds the customer value evaluating characteristic index system of various dimensions;
According to the value characteristic of the index system statistical sample client, and carry out the differentiation of sample client's quality.
3. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as claimed in claim 1 or 2, special Sign is that value characteristic includes the corresponding essential attribute of user, economic value, load in the acquisition sample customer value feature Value, dynamogenetic value, credit worthiness, industry are worth data.
4. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as described in claim 1, feature It is, it is described to use sample customer data, it is trained based on random forest and decision Tree algorithms, structure top-tier customer identifies mould The specific steps of type include:
Sample customer data is pre-processed;
Top-tier customer judgment models are built based on random forest method;
Top-tier customer business rule paraphrase model is built based on decision Tree algorithms;
Using pretreated sample customer data to top-tier customer judgment models and top-tier customer business rule paraphrase model into Row model training builds top-tier customer identification model.
5. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as claimed in claim 4, feature It is, it is described to include to the pretreated specific steps of sample customer data progress:Data cleansing, characteristic factor quantization, feature are opened up Exhibition, feature selecting and outlier processing.
6. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as claimed in claim 5, feature It is, the data cleansing is to be examined by the inspection of data over run value, feature validation test and data null value, is carried out to data Cleaning;
The data over run value verifies as checking that electricity consumption and electricity charge electricity price in sample customer data are 0 record and are deleted It removes;The feature validation test is the record that user's importance characteristic information is excessively single in inspection sample customer data;Institute Data null value is stated to verify as checking that the complete overdue number of days of empty and electricity charge returned money of pause day digital section lacks serious record.
7. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as claimed in claim 4, feature It is, it is described to use pretreated sample customer data to top-tier customer judgment models and top-tier customer business rule paraphrase mould Type carry out model training specific steps include:Carry out successively the training of full feature, important feature training, the training of full characteristic crossover and Important feature cross-training;
The full feature training:Sample chooses whole sample customer datas, and model enters ginseng for whole operational indicators;
The important feature training:Sample chooses whole sample customer datas, and it is high preceding 40% index of importance that model, which enters ginseng,;
The full characteristic crossover training:Sample customer data is averagely split into 10 parts, select every time wherein 9 parts as trained sample This, remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
The important feature cross-training:Sample customer data is averagely split into 10 parts, every time select wherein 9 parts as train Sample, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
8. a kind of top-tier customer Optimum Identification Method based on random forest and decision tree as claimed in claim 4, feature It is, before model training, the method further includes:Importance index is chosen in such a way that MDA methods and MDG methods are combined, By model training, index importance analysis result is obtained.
9. a kind of top-tier customer Statistical error device based on random forest and decision tree, including memory, processor and storage On a memory and the computer program that can run on a processor, which is characterized in that when the processor executes described program Realize such as claim 1-8 any one of them methods.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of such as top-tier customer Statistical error based on random forest and decision tree of claim 1-8 any one of them is executed when execution Method.
CN201810028008.7A 2018-01-11 2018-01-11 Top-tier customer Optimum Identification Method and device based on random forest and decision tree Pending CN108388974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810028008.7A CN108388974A (en) 2018-01-11 2018-01-11 Top-tier customer Optimum Identification Method and device based on random forest and decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810028008.7A CN108388974A (en) 2018-01-11 2018-01-11 Top-tier customer Optimum Identification Method and device based on random forest and decision tree

Publications (1)

Publication Number Publication Date
CN108388974A true CN108388974A (en) 2018-08-10

Family

ID=63076094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810028008.7A Pending CN108388974A (en) 2018-01-11 2018-01-11 Top-tier customer Optimum Identification Method and device based on random forest and decision tree

Country Status (1)

Country Link
CN (1) CN108388974A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583691A (en) * 2018-10-19 2019-04-05 中国平安人寿保险股份有限公司 Electronic device, orphan's list distribution method and computer readable storage medium
CN110033307A (en) * 2019-01-04 2019-07-19 国网浙江省电力有限公司电力科学研究院 A kind of electric power top-tier customer screening technique based on machine learning model
CN110135701A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Control automatic generation method, device, electronic equipment and the readable medium of rule
CN110516748A (en) * 2019-08-29 2019-11-29 泰康保险集团股份有限公司 Method for processing business, device, medium and electronic equipment
CN110868732A (en) * 2018-08-27 2020-03-06 中国移动通信集团广东有限公司 VoLTE radio access failure problem positioning method, system and equipment
CN111222556A (en) * 2019-12-31 2020-06-02 中国南方电网有限责任公司 Method and system for identifying electricity utilization category based on decision tree algorithm
CN111768020A (en) * 2019-04-02 2020-10-13 卜晓阳 Customer electricity demand identification method based on SVM algorithm
CN112036085A (en) * 2020-08-31 2020-12-04 中冶赛迪重庆信息技术有限公司 Steel rolling fuel consumption parameter recommendation method, system, medium and terminal
CN112529236A (en) * 2019-09-18 2021-03-19 泰康保险集团股份有限公司 Target object identification method and device, electronic equipment and storage medium
CN112801693A (en) * 2021-01-18 2021-05-14 百果园技术(新加坡)有限公司 Advertisement characteristic analysis method and system based on high-value user
CN113313572A (en) * 2021-05-28 2021-08-27 上海浦东发展银行股份有限公司 Model identification method based on accumulation fund point-credit customer

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110868732A (en) * 2018-08-27 2020-03-06 中国移动通信集团广东有限公司 VoLTE radio access failure problem positioning method, system and equipment
CN109583691A (en) * 2018-10-19 2019-04-05 中国平安人寿保险股份有限公司 Electronic device, orphan's list distribution method and computer readable storage medium
CN109583691B (en) * 2018-10-19 2024-04-19 中国平安人寿保险股份有限公司 Electronic device, orphan list distribution method, and computer-readable storage medium
CN110033307A (en) * 2019-01-04 2019-07-19 国网浙江省电力有限公司电力科学研究院 A kind of electric power top-tier customer screening technique based on machine learning model
CN111768020A (en) * 2019-04-02 2020-10-13 卜晓阳 Customer electricity demand identification method based on SVM algorithm
CN110135701A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Control automatic generation method, device, electronic equipment and the readable medium of rule
CN110516748A (en) * 2019-08-29 2019-11-29 泰康保险集团股份有限公司 Method for processing business, device, medium and electronic equipment
CN112529236A (en) * 2019-09-18 2021-03-19 泰康保险集团股份有限公司 Target object identification method and device, electronic equipment and storage medium
CN111222556B (en) * 2019-12-31 2023-12-05 中国南方电网有限责任公司 Method and system for identifying electricity utilization category based on decision tree algorithm
CN111222556A (en) * 2019-12-31 2020-06-02 中国南方电网有限责任公司 Method and system for identifying electricity utilization category based on decision tree algorithm
CN112036085A (en) * 2020-08-31 2020-12-04 中冶赛迪重庆信息技术有限公司 Steel rolling fuel consumption parameter recommendation method, system, medium and terminal
CN112801693A (en) * 2021-01-18 2021-05-14 百果园技术(新加坡)有限公司 Advertisement characteristic analysis method and system based on high-value user
CN113313572A (en) * 2021-05-28 2021-08-27 上海浦东发展银行股份有限公司 Model identification method based on accumulation fund point-credit customer

Similar Documents

Publication Publication Date Title
CN108388974A (en) Top-tier customer Optimum Identification Method and device based on random forest and decision tree
CN108280541A (en) Customer service strategies formulating method, device based on random forest and decision tree
CN106780140B (en) Power credit evaluation method based on big data
CN108154311A (en) Top-tier customer recognition methods and device based on random forest and decision tree
TWI257556B (en) Rapid valuation of portfolios of assets such as financial instruments
CN107958043B (en) Automatic generation method for power grid engineering budget list
CN108389069A (en) Top-tier customer recognition methods based on random forest and logistic regression and device
CN108256691A (en) Refund Probabilistic Prediction Model construction method and device
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
CN108388955A (en) Customer service strategies formulating method, device based on random forest and logistic regression
CN107689008A (en) A kind of user insures the method and device of behavior prediction
CN108364191A (en) Top-tier customer Optimum Identification Method and device based on random forest and logistic regression
CN101398919A (en) Electric power requirement analytic system for utilizing mode analysis and method thereof
CN110555782A (en) Scientific power utilization model construction system and method based on big data
CN110119948A (en) Based on when variable weight dynamic combined power consumer credit assessment method and system
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN104346698A (en) Catering member big data analysis and checking system based on cloud computing and data mining
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN115130811A (en) Method and device for establishing power user portrait and electronic equipment
CN112950359B (en) User identification method and device
CN110188255A (en) Power consumer Behavior mining method and system based on the shared fusion of business datum
CN109858947A (en) Retail user value analysis system and method
CN109858756A (en) A kind of service quality defect diagnostic method and device
CN112767114A (en) Enterprise diversified decision method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180810

RJ01 Rejection of invention patent application after publication